SIMPLY 


ARTIFICIAL 
INTELLIGENCE 


ce CG) Gl Ss 
Csr ES 
rik @) & 208 


‘CONSULTANT 
Hillary Lamb |s an award-winning scl 
‘and technology journalist, editor, and 
author. She studied physics at the 
University of Bristol and science 
communication at Imperial College 
London before spending five years as a 
staff magazine raporter. She has worked 
fon previous OX titles, including How 
Technology Works, The Physics Book. 

‘and Simply Quantum Physics 


EDITORIAL CONSULTANT 
Joe! Levy isa writer who specializes in 
Sclence and the history of science. His 
writing explores both mainstream science 
and weird technolagy, and hie books 
Include The Infinite Tortoise: The Curious 
Thought Experiments of History's Greet 
Thinkers, Gothic Science: The Era of 
Ingenuity and the Making of Frankenstein, 
and Reality Aheod of Schedule: How 
‘Science Fiction Inspires Science Fact. 


CONTRIBUTOR 
Dr, Claire Quigley is a computing scientist 
who has previously worked for the 
Universities af Cambridge and Glasgow. 
She has contributed te the creation of 
coding activities for the BBC and Virgin 
Media, and written For several previous 
Dk titles, including Help Your Kids with 
Computer Science, Computer Coding 
Python Gomes for Kids, and Computer 
Coding Python Projects fer Kids. 


CONTENTS 


7 


INTRODUCTION 


HISTORY 
OF ARTIFICIAL 
INTELLIGENCE 


10 


n 


n 


B 


“ 


6 


6 


” 


8 


20 


AN IMITATION OF LIFE 
Automate 

DEFINING INTELLIGENCE 
‘Multiple intelligences 
‘THINKING = COMPUTING 
Computationatism 

‘ZEROS AND ONES 

Binary code 

STEP BY STEP 

Algorithms 

ALGORITHMS IN ACTION 
Computation 

INSTRUCTING COMPUTERS. 
Programs 

‘THE FIRST MECHANICAL 
COMPUTERS 

Babbage’s machines 

A THEORETICAL COMPUTER 
Turing’s universal machine 
AN ELECTRIC BRAIN 
‘Neurons and computation 
ARTIFICIAL NEURONS. 
Threshold logic units 

A PROGRAMMABLE 
‘COMPUTER 

ENIAC 

ATHEORETICAL PROGRAM 
Turechamp 


24 ACOMPUTING BLUEPRINT 
Von Neumann architecture 
26 TWO KINDS OFAI 
Weak and strong Al 
27 ALIN ACTION 
Intelligent agents 


28 TRIAL AND ERROR 
Learning to learn 

29 MIMICKING THE BRAIN 
Connectionism 

30 AIMODELS 
‘Classical vs. statistical AL 


31 COMPUTING POWER 
Moore's low 


32 RAW INFORMATION 
Types of dota 


33. EVERYTHING, EVERYWHERE, 
ALL OF THE TIME 
Big data 


CLASSICAL 
ARTIFICIAL 
INTELLIGENCE 


36 REPRESENTING DATA 
Symbols in Al 

37 FOLLOWING THE RULES 
Computer Logic 

38 WHAT, WHEN, WHY, AND 
How? 
Kinds of knowledge 


39 PRESENTING KNOWLEDGE 
Knowledge representation 
40 IF THIS, THEN THAT 
Rules 
42. THE SHORTEST ROUTE 
Pathfinding 
43. IMPERFECT SOLUTIONS 
Heuristics 
44 PERFORMING A TASK 
Planning and Al 
46 DEALING WITH UNCERTAINTY 
Probability and Al 
48 MODELING CHANGES 
The Markov chain i 
49 MODELING UNCERTAINTY 
‘Stochastic models, H 
50 AUTOMATED ADVICE H 
Expert systems H 
52. HANDLING “MESSY” DATA 
Messinoss | 
54 NEATS VS, SCRUFFIES i 
Two fields of Al research H 
58 TEACHING Als TO THINK 
‘Machine learning 
60 GAINING INSIGHT FROM 
DATA 
Data mining 
6] TEACHING MATERIALS 
Training data 
62. GIVING DATA MEANING 
Features and labels 
64 LOOKING FOR PATTERNS 
Pattern recognition 
65 YESORNO? 


Decision trees i 


6 


@ 


cr) 


© 


70 


TYPES OF DATA 
Glassification 

‘THE LINE OF BEST FIT 
Regression 

GROUPING DATA 

Clustering 

‘THE ODD ONE OUT 

‘Anomaly detection 

‘THE MOST LIKELY OUTCOME? 
Predictions 


MACHINE LEARNING WITH 
“LABELED” DATA 
Supervised learning 
MACHINE LEARNING 
WITH “RAW” DATA 
Unsupervised learning 
LEARNING FROM 
FEEDBACK 

Reinforcement learning 
WORKING TOGETHER 
Ensemble learning 

THE Al BRAIN 

Artifical neural networks 
NETWORK STRUCTURE 
Layers 

[ASSIGNING IMPORTANCE 
Weighting 

GOALS AND THRESHOLDS 
Bias 

MEASURING SUCCESS 
Cost function 

IMPROVING PERFORMANCE 
Gradient descent 
REFINING THE MODEL 
The Delta rule 


USING ARTIFICIAL 


A ONE-WAY NETWORK 
Feedforward neural networks 
FINE-TUNING DATA, 
Backpropagation 
‘STRUCTURED DATA 

Recurrent neural networks 
BUILDING A BRAIN 

Deep leorning 

AIVS. Al 

Generative adversarial networks 


PROCESSING VISUAL DATA 
Convolutional neurol networks 


INTELLIGENCE 


USES OF AI 
Applications of Al 
RANKING 

Dato hierarchies 
RECOMMENDING 

Tailored content 
DETECTING THREATS 
Cybersecurity 

ONLINE ATTACKS, 

Gyaer worfare 

DETECTING FRAUD 
Transaction monitoring 
ALIN FINANCE 
Algorithmic trading 
UNRAVELING PROTEINS. 
‘Medical research 
SEARCHING FOR PLANETS 
Astronomical research 
DIGITAL DocToRS 

Alin medical diagnesis 


103 


104 


105 


106 


107 


108 


109 


1m 


15 


WwW 


MONITORING HEALTH 
Aland healthcare 
INTERNET OF THINGS 
Connected devices 

‘SMART DEVICES 
Embedded AI 
MONITORING SYSTEMS 
Aland infrastructure 
“SMART” FARMING 
Precision agriculture 
‘SENSORY Al 

‘Machine perception 
PROCESSING SOUND 
‘Machine hearing 
MIMICKING SIGHT 
Computer vision 

FACIAL RECOGNITION, 
Feature mapping 
UNDERSTANDING WORDS 
‘Natural language processing 
ALINTERPRETERS 

‘Machine translation 
TALKING WITH AI 
Chatbots 

AI HELPERS 

Virtual essistonts 


ALARTISTS 
Generative Al 

INTELLIGENT ROBOTS: 
Embodied Al 
AICOMPANIONS. 

Social robots 

MOVEMENT AND MOBILITY 
Physical interactians | 


m1 


122 


123 


MANUAL DEXTERITY 
Physical interactions 1 
DRIVERLESS CARS 
Autonomous vehicles 


ALAND WARFARE 
‘Autonomous weapons 


PHILOSOPHY 
OF ARTIFICIAL 
INTELLIGENCE 


136 


137 


HUMANLIKE AL 
Artificial general intelligence 
‘THE POINT OF NO RETURN 
The technological singularity 
WHERE IS CONSCIOUSNESS? 
Leibnia’s question 

DO SUBMARINES SWIM? 
Funetionalism 

‘THE IMITATION GAME 

The Turing test 
INTELLIGENCE METRICS 
Intelligence tests 

MACHINES AND 
UNDERSTANDING 

The Chinese Room experiment 


PHILOSOPHICAL ZOMBIES 
Human vs. machine intelligence 
ANEW KIND OF PERSON 

Al rights and responsibilities 
REPLICATING THE MIND 
‘Multiple realizability 
‘TRANSPARENT THINKING 
‘Opening the box 


LIVING WITH 
ARTIFICIAL 
INTELLIGENCE 


40 


142 


143 


144 


145 


146 


147 


148 


“9 


150 


151 


MYTH OR REALITY? 
The truth about Al 

GARBAGE IN, GARBAGE OUT 
Dato quality 

PREJUDICED OUTCOMES 
Hidden bios 

MAKING ASSUMPTIONS 

Al profiling 

‘TRANSPARENT PROCESSING 
White box Al 

‘AN Al WORKFORCE 
Technological unemployment 
‘THE Al BALANCE 

Aland equality 

‘AN ECHO CHAMBER 

Filter bubbles 

THE LIMITS OF CONTROL 
Alutonomy 
RIGHT VS. WRONG 

Ethical design 

BUILT-IN ETHICS 

Asimov's three laws 

WHO Is TO BLAME? 

Aland liability 

WHAT SHOULD WE ALLOW? 
Aland regulation 
EXISTENTIAL RISKS 

Anal dystopia 

UNLIMITED REWARDS 

An Alutopio 


HISTORY 
ARTIFIC 


INTELLI 


AN IMITATION OF LIFE 


‘An automaton is a machine that is able to operate on its own, following 
a sequence of programmed instructions. Historically, most automata 
were animated toys—often clockwork figures or animals, same of which 
were surprisingly lifelike, Animatronics, which are typically used to 
portray film or theme-park characters, are 
modem electronic automata 

In Al, the word “automaton” refers to 
a computer that can be programmed 
to perform a specific task, such as 
forecast the stock market 
or analyze customer 
behavior. The latest 
Alsare highly 
sophisticated, and 
appear to have minds 
of their own. However, 
‘one has yet to be built that 
can control its own actions. 


‘ANDROID 


‘An android a automaton 
that has been designee to 
‘mimic human behavior. 


10 | AUTOMATA 


DEFINING 
INTELLIGENCE 


English mathematician Alan Turing 
(1912-54) devised a test that can be 
used to establish whether a machine 
has humanlike intelligence (see 
pp.130-31), Originally, the Turing test Rage 
focused on numerical intelligence (the mrctienice 
ability to perform mathematical 
calculations). However, scientists now 
argue that since there are different 
kinds of intelligence (such as artistic 


intelligence for it to be considered the 

equivalent of a human being. Broadly 
speaking, there are eight kinds of 

intelligence, including sensory 

intelligence (the ability to interact 

with one’s environment) and reflective 

intelligence (the ability to reflect upon 

and modify one’s behavior). 


«1 know that! | 
am intelligent, 
because | know 

that | know 
nothing.” 


L Sociates J 


MULTIPLE INTELLIGENCES | 17 


THINKING = COMPUTING 


The idea that all thinking, whether human or artificial, is. 
a form of computing (see p.15)—specifically, a process of 
using algorithms to convert symbolic inputs inta symbolic 
outputs (see p.36)—is known as “computationalism.” 
Computationalists argue that the human brain is a 
computer, and that one day an Al should therefore be 

able to do anything that a brain can do. In other words, 
‘they claim, such an Al would not merely simulate 
‘thinking—it would have genuine, humanlike consciousness. 


INPUT OUTPUT 


12 | COMPUTATIONALISM 


ZEROS AND ONES 


A binary code is ¢ code that reps Ths: 
information, stich as instructions, using only. 
twornumbers, or digits. The binary code most 
‘commonly Used In:compultino features the 
numbers O and 1, each of which represents 

@ bit” of information, Any numiber can be 
convertedt into zeros.and ones (for example 
the decimal number 12 is 1100 in binary), as 
‘canaiiy-lettiee of ahy known alphabet. The 
tno digits can also represent the two states of 
‘an electrical current’~"on", or “off”—-meaning 
thats oflware transiated into binary code can 
be thad bya compute 


BINARY CODE 


anoutput. 


Processing 
‘An algorithin 
consists of 2 

series of steps that 
sequentially process 
‘the input data to 
give a desired 
output. 


The outputs any 
information ot 
data produced by 
the algerithen. 


STEP BY STEP 


An algorithm is a sequence of instructions for 
accomplishing a task. It takes an input, such as information 
or data, and processes it in a series of steps to produce a 
desired result, or output. The task or process can range from 
a simple calculation, or following a recipe to make a meal, to 
solving complex mathematical equations, An algorithm is an 
‘example of what mathematicians call an “effective method.” 
which means it has a finite number of steps and produces 
a definite answer, or output. 


14 | ALGORITHMS. 


‘Components 
of calculation 
Computations have 
aninputand an 
‘output, and multiple 
steps. They ean vary 
‘rom simple sumeto 
‘complex equations. 


ALGORITHMS IN ACTION 


A computation is a calculation that follows the steps of an 
algorithm (see opposite). The most straightforward example of 
computation is arithmetic calculation. For example, if you add 
‘together 2 pair of three-digit numbers in your head, you follow 
a series of steps, or an algorithm, to achieve this calculation. 
‘Computations use symbols to represent numbers, but symbols 
can represent almost anything else (see p.36). With the right 
symbols and the right algorithms, immensely complex 
computation becomes possible. 


COMPUTATION | 15 


16 


MULTITASKING 


‘program can 
run multiple sets 
‘of instructions at 

the sometime 


ourPur 


INSTRUCTING 
COMPUTERS 


A program is a sequence of instructions written in code 
that enables a computer to perform one or more tasks. 
Charles Babbage (see opposite) imagined the first program 
He was inspired by the design of a certain silk loom, which 
had parts that moved up or down in response toa pattern 
of holes punched into a card. Babbage recognized that these 
holes could store instructions to operate the cogs and levers 
of a machine he was designing: the “Analytical Engine” 
Modern computers work on the same principle, following 
sequences of instructions, which are usually written 
in binary code (see p.13). 


PROGRAMS 


ANIC 
ect AL Co 
In the 19th century, a 


the complex work of producing 
numerical tables (used in navigation, 


« A 
x warfare, and other fields) was performed by % 
KR w 


people known as “computers.” To avoid mistakes 
caused through human error, English mathematician 
Charles Babbage (1791-1871) invented what he called 
the "Difference Engine”—a machine that could perform 
mathematical calculations mechanically. Babbage then 
designed the “Analytical Engine’—a general-purpose 
calculator that could be programmed using punched 
cards (see opposite), and had separate memory and 
processing units. Although it was never built, the 
Analytical Engine had many of the key features 
of modern computers (see p.22). 


BABBAGE’S MACHINES 


7 


4 The panertape aso 
3 beainirgor end each 

| umber represen bit” 

3 | of fermation ee pa), 

3 | S 


which way to moveand 
ether to replace the digit. 


18 | TURING'S UNIVERSAL MACHINE 


A THEORETICAL COMPUTER 


In 1936, English mathematician Alan Turing (1912-54) 
proposed an imaginary machine that could solve any problem 
that could be made “computable” (see p.15). In other words, 
as long as the problem could be written using symbols and 
algorithms, and translated into binary code (see p.13), his 
machine could solve it. The device consisted of a head that 
moved over a tape marked with binary information. Although 
it was never built, Turing’s Universal Machine sparked the 
computer revolution by proving that a machine could tackle 
any computable problem. 


Problem-selving machine 
‘A “readjwrite” head moves back and, 
forth along a paper tape. Following 
Instructions from an algorithm, it 
changes 1s to 0s, and vice versa, 
depending on what has come before, 


TURING’S UNIVERSAL MACHINE | 19 


AN ELECTRIC BRAIN 


Alan Turing (see pp.18-19) had shown that machine 
could carry out any computation (see p15) with the right 
combination of symbols. In 1943, scientist Walter McCulloch 
(1898-1969) and mathematician Walter Pitts (1923-69) 
demonstrated that networks of units based on human 
nerve cells, or neurons, passing electric signals back and 
forth, could copy a Turing machine. They suggested that 
the brain might be a kind of living computer, meaning 
that a program that ran on the human brain might also 
Tun on an electric brain. This theory is known as the 
principle of “multiple realizability” (see p.136). 


20 | NEURONS AND COMPUTATION 


A PROGRAMMABLE 4 
COMPUTER 


The Electronic Numerical Integrator and Computer 
(ENIAC) was an early electronic computing machine built in 
the US between 1943 and 1946, Made up of over 18,000 
vacuum tubes (electronic components resembling light 
bulbs) and covering 1,800.sqft (167 sqm), it calculated range 
tables (a list of the angles and elevation needed to hit a 
target) for artillery, giving its answers on paper punchcards. 
In just 20 seconds it could complete a calculation that took 
people hours using clectromechanical calculators, ENIAC 
__as programmed by changing the arrangement of cables 

4 


"¢, 


‘that plugged into it, which took days to complete. It was the > 
first machine computer that could run different programs. = 


Making the best move 
‘Turochamp calculated all of the 
responses it could make, gave 
point values to these, and then 
selected the highest-value move. 


vF 


A THEORETICAL PROGRAM 


In 1948, Alan Turing (see pp.18-19) and mathematician 
David Champerowmne (1912-2000) set out to prove that, with 
the right algorithm, a computer could play a game of chess. 
‘At the time, no electronic computer existed that could run 
such an algorithm, so Turing played the role of computer 
himself, performing each step of the algorithm on paper. 
“Turochamp,” as they called it, was further proof that 
computers (whether human or artificial) could perform 
complex calculations without understanding what they 
were doing, but simply by following a set of instructions. 


TUROCHAMP | 23 


A COMPUTING 
BLUEPRINT 


John von Neumann (1903-57) was a 
Hungarian-American scientist involved 
In developing ENIAC (see p.22), the first 
programmable computer. He devised a 
model (see right) that established how 
the main components of modern-day 
computers are structured—known as 
von Neumann architecture, The major 
advancement was the use of a memory 
unit that contained both the programs 
(See p.16) and data (see p.32), making the 
machines quicker and easier to reprogram 
than existing ones. Information within 
the memory unit feeds into a central 
processing unit (CPU). Within the 

CPU is a control unit that decodes the 
program into instructions, which are 
enacted by an arithmetic and logic unit 
(ALU), using data to perform calculations 
and tasks, The results of these are then 
fed back into the memory unit. 


r 


| inputdevices, such a5 
3 | a teyboardand mouse, 
HE cnable user to input 

data intothe machine, 


INPUT 
DEVICE 


Structural advantage 
‘This diagram shows von 
Neumann's architectuce, 
Because the memory units 
could be upgraded, the 
‘machines could be made 
faster and more powerful 


- 
“Any computing machine that is 
to solve a complex mathematical 
problem must be ‘programmed’ 
for this task.” 


john ven Neumann 


al 


24 | VON NEUMANN ARCHITECTURE 


CENTRAL 
PROCESSING 
UNIT (CPU) 


CONTROL 
UNIT 


ARITHMETIC 
AND LOGIC 
UNIT (ALU) 


MEMORY 
UNIT 


The 
‘unit and ALU, and links tothe 
Input and output cevces 
DATACONTROL, 
Thiseontrols te flow 


cof data within the CPU 
‘and instructs the ALU. 


OUTPUT 
DEVICE 


Output devices, such asa 
‘monitor or printer, enable 
| vsors to view the data 


“This folowsinstructions 
from the control unit 
and processes the data 


VON NEUMANN ARCHITECTURE | 25 


Inputs can come 
from sensors 

such as cameras, oF 
fom a controle’ 
leet instructions 


—~-a 


Agent interaction 

‘An intelligent agent reacts to 

and affects the environment 
around it 


Sensors 


4AN39V 


Se / 


Any device that affects | 


Al IN ACTION 


An “intelligent agent” in Al is anything that can sense, 
respond to, and affect its environment—which can be 
physical or digital. Examples include robots, thermostats, 

and computer software programs. The agent has “sensors,” 
which it uses to perceive its environment, and “actuators,” 
which it uses to interact with its surroundings. The action the 
agent takes depends on the specific goals that have been set 
for it and on what it senses. Some agents can learn (see 
pp.58-59), so that they are able to change the way 
they react to conditions within their environment. 


INTELLIGENT AGENTS 


27 


TRIAL AND ERROR 


Machines that can follow 
simple instructions, such 

as calculators that apply 
mathematical rules, have 
existed for decades. 

Creating machines that 

can “lean”—the basis of 
modern Al—is far more 
recent and complex. To do so, 
programmers use algorithms. 
(see p14) that are repeatedly 
revised through trial and error 
‘to improve their accracy. 

Like natural evolution, the 
improvements made are 
gradual and incremental. As 
Als become more advanced, 
they are able to contribute 
to their own learning, 
although currently they 
require human assistance. 


Improved accuracy 
Teaching machines te 
learn maans making 
them more accurate 
and more reliable. 


28 | LEARNING TO LEARN 


MIMICKING THE BRAIN 


Connectionism is an approach to A\ in which information is represented 
not by symbols but by patterns of connection and activity in a network, 
‘These patterns are known as “distributed representations,” and 
computation that is done in this way is known as “parallel distributed 
processing” (PDP). Connectionists believe that intelligence can be 
achieved by taking simple processing units, such as artificial neurons 
(see p.21), and connecting them together into huge “artificial neural 
networks” (ANNS, see p.76) to allaw PDP. As its name suggests, the 
connectionist model is based on how the brain works—using parallel 
processing across interconnected networks of cells, or neurons. 


OUTPUT 


CONNECTIONISM | 29 


CLASSICAL AI 


Deduction 
Clacieal Al mimic 
human logic: they 
‘answer questions 
by following strict 
mathematical rules, 


Wl 


Al MODELS 


‘The earliest forms of Al are now known as classical (or symbolic) Als. 
They were constructed according to the top-down approach, in which 
computer designers first figured out the rules of symbolic reasoning — 

how humans think—and built them into the Als. Their performance was 
always limited by the rigid application of human-derived rules and their 

programmers’ understanding of them. In contrast, modern statistical Als 

are constructed according to the bottom-up approach. They are provided 

with masses of data and machine-learning tools (see pp.58-59) that 
‘enable them to find patterns in the data. From these patterns they are 
able to build models that show how particular systems (such as financial 
markets) operate under particular conditions, 


STATISTICAL AI 


Modeling 
Statistical Als use 
‘machine-learning 


toolsto construct 
models of how 
systoms work 


30 | CLASSICAL VS STATISTICAL Al 


Intel microprocessors 
Moore's prodiction was berne 

‘out closely until the 1990s. By the 
‘time the Pentium microprocessor 
2ppeared (replacing the 80406. 
chip), the rate of increase in 
processing power was already 
slowing down, 


Predicted rate 
of increase. 


NUMBER OF TRANSISTORS, 


History 


COMPUTING POWER 


Moore’s Law is named after Gordon Moore (1929-), the cofounder 
of integrated circuit chip-maker Intel. In 1965, Moore predicted that 
the number of transistors that could be fitted onto 2 computer chip. 

would double every two years. Due to advances in technology, 

particularly miniaturization, this prediction was barne out for decades, 
and although it has since slowed down, computing power is still 
increasing each year. This means that in the foreseeable future, 

if computationalism is correct (see p.12), Als will have the same 

‘amount of computing power as the human brain. 


MOORE'S LAW | 31 


RAW INFORMATION 


Data is information that can take many forms, such as numbers, 
words, or images. In computing, data is a sequence of symbols that is 
collected and processed by a computer according to its pragramming. 

In modern computers, these symbols are the 1s and 0s of binary—or 
digital—code (see p.13). This data |s either “at rest” (stored physically 
in a database), “in transit” (being used for a finite task), or “in use” 

(constantly being updated), and it can also be shared between 

computers. Data is classified according to whether it can be 
measured and how this is done. 


‘QUALITATIVE 


'Non-numerical data, which 
can be words, images, or 
audio. It cannot be measured 
‘and must be interpreted 


32 | TYPES OF DATA 


QUANTITATIVE 
Numerical data, which can 
be counted and statistically 
analyzed. It can be measured 
and is objective, 


Wy ee @ 


BB 
22 


EVERYTHING, EVERYWHERE, 
ALL OF THE TIME 


"Big data” is a phrase that describes data sets that are too large 
to be pracessed by traditional forms of data-processing software. 
Such data sets include massive amounts of information about people, 
their behavior, and their interactions. For example, mobile phone 
‘companies use their customers’ phones to track the movements of 
billions of people, every second of every day, and they record this 
information in vast data sets. Big data is widely used in Al, from training 
machine-learning models (see pp.58-59), making predictions 
about the weather or future customer behavior (see pp.70-71), 
to protecting against cyherattacks (see p.97), 


BIG DATA | 33 


CLASSIC 
ARTIFIC 


INTELLI 


From the 1950s to the 1990s, the dominant paradiam 

in Al research was classical (or “symbolic” or “logical”) Al 
This approach to Al was based on logical reasoning, using 
symbols and rules—written by human programmers—to 
represent concepts and the relationships between them. 
Classical Al had many successes, including Als that could 
play games, hold basic conversations, and answer queries 
using "expert systems.” Although statistical Al has since 
‘overtaken classical Al, the old approach has not been 
entirely abandoned; many of its techniques have been 
incorporated into modern Al applications, such as natural 
language processing and robotics. 


REPRESENTING DAT, 


In Al, a “symbol"is a graphical representation 
of a real-world item or concept—a simple type of 
symbol is a picture. A symbol can also be a group 

of other symbols, such as the letters that make 
up the name of an object. In classical Al, symbols 
‘embody the total sum of the relevant facts and 
information required for the system to understand 
what something is, To achieve this, data is labeled 

(ee pp.62-63) and attached to a symbol. The 

symbol for an apple would include a wealth of 

data stating what an apple is and is not. 


SYMBOLS IN Al 


FOLLOWING THE RULES 


Logic is the study of sound reasoning, and of the rules 
that determine what makes an argument yalid. In practice, logic 
enables people to take statements about the world (known as 
premises) and derive new information from those statements (known 
as conclusions). Als are programmed to follow strictly logical rules, 
with the aim of producing reliable conclusions, One such rule is the 
syllogism, which states: “If all As are Bs and all Bs are Cs, then all 
‘As are Cs.” This simple principle enables Als ta know that all items 
of a particular class will always have a particular characteristic. 


Syllogistic logic 

An Al that understands that fruit ic healthy, 

and that an apple's a fruit, also knows that 
apples are heaitny 


PREM 
APPLES ARE FRUIT 


ONCLUSION 
APPLES ARE HEALTHY 


FRUIT IS HEALTHY wy 


pe 
KC 
3 


COMPUTER LOGIC | 37 


WHAT, WHEN, WHY, 
AND HOW? 


Al systems use up to five kinds of knowledge in their 
interactions with the world, but only two are common to 
all Als. Declarative knowledge is the most basic form and 
describes statements of fact, such as "cats are mammals,” 

whereas procedural knowledge instructs Als how 
to complete specific tasks. In some Als, meta-, heuristic 

(see p.43), and structural knowledge provide further 

information that enables them to solve problems. 


38 | KINDS OF KNOWLEDGE 


PRESENTING 
KNOWLEDGE 


In order for an Al to Cogien cepresectatot 


i Statements of information are clear, 
understand information logical, and unambiquous. 


correctly, the information 
must be presented to it 
very clearly. There are four 
main ways of doing this. 
“Logical representation” 


pases inet naaor slid ‘Semantic representation 

the exact words of a natural The relationships and connections between 
language (or symbols to facts within the information are made leer 
represent them). “Semantic 

representation” ensures that. iA isa 


the individual meanings a Gay aa 


within the information are 
connected in a formal, logical site 
way. "Frame representation” 


involves presenting the Four) sama Frame 
information in a tabular - representation 
format, with facts allocated fa lie ites a 
: ore ee) presented as simple 
to individual “slots.” Finally, <a] RMeEnee ve 
“production rules” are the bene slots contain details 
instructions that state what cuir 
conclusions an Alcan deduce pod. tion ules 
from the information it is When an "If" statement is true, 2 “THEN 
supplied with (see p.37). statement can be deduced trom i 


KNOWLEDGE REPRESENTATION | 39 


IFSTATEMENTS. 


+ 


IF THIS, THEN THAT 


A.rule-based Al system uses instructions, consisting of “IF-THEN* 
statements, to draw conclusions based on an initial set of facts. 
In its simplest form, an IF-THEN statement says to the system: 
““f this condition is true for the current facts, then do this; if it 

is false, do nothing.” Adding an “ELSE” option allows for more 
complicated statements: “If this is true, then do this; otherwise 
(else), do that.” Rule-based systems are predictable, reliable, and 
“transparent,” meaning it is easy to see which rules the Al applies. 
However, rule-based Als cannot “learn” by adding to their store 
of rules and facts without human intervention. 


40 | RULES 


uy =} 


‘Much of what we 


do with machine 

learning happens 

beneath the 

surface.” 
pmeees: 


See 
4 


RULES | 41 


THE SHORTEST ROUTE 


Pathfinding algorithms are search algorithms that are used to find 
the shortest route between two points. They have many uses, 
including vehicle navigation and computer gaming. The algorithm 
is programmed using a weighted graph (see below) that shows all 
of the possible paths available. The circles, or “nodes,” represent 
vaypoints, or special locations, which are joined by lines known as 
“edges.” Programmers add a weight (see p.78) to the edges, which 
reflects a “cost,” such as distance or time. The algorithm 
calculates the weights to find the shortest path. 


| Theweights areadded 


together to caleulate the 


‘AY Algorithm 
This algorithm is used in 


pathindng, or moving 3 
actoss a graph. |t plots a path Route "A, RE, 7) & 
between points, or nodes. ‘has the lowest cost, # E 
Sarcevorrone | 


thesnortestroute. |B 


42 | PATHFINDING 


IMPERFECT SOLUTIONS 


‘Some problems can be too complex for an algorithm to solve quickly. 
In such cases, an Al can do a “brute-force search,” which means to 
methodically work through and evaluate every possible solution. This 

is slow, however, and in some cases impossible. A more efficient 
alternative is to use a “heuristic.” This practical method uses a 
‘common-sense approach, searching for an approximate solution by 
estimating a “good enough’ choice at every decision point based on 
the information available. 


DIAGONAL 

Roures ROUTE 

The reg inthis kindof 

yelion.and medel, isganal 

teen routes, routes cannot 

whieh ae of bo taken 
equa length 
aredtfferent 

DECISION 

guide the! to 

imate the best 

choiceat every 

Intereaction 


Manhattan distance 

‘The Manhattan distanes heuristic maps 
routes by calculating squares moved 

Vertically and horizontally tcan be used 
to plot o path inan area witha grid 

system, such as Manhattan in New York, 


HEURISTICS | 43 


PERFORMING A TASK 


Embodied Als (see p.118), such as robots, use a technique known as 
“planning” to help them solve practical problems. Planning involves 
understanding the environment or lacation in which the task must 

be performed and mapping out the actions required to complete it. 
The Al must identify each step required to fulfill the task and the 
coptimal—lowest cost—sequence in which to perform them (see p.42). 
If the optimal sequence is not possible, for whatever reason, it must 
also be able to decide the next-best alternative (see p.43). It must 
also identify and avoid any actions that would prevent it 

from completing its task. 


Planning ahead 
Inorder to complete ts 
‘ash, the robot brea! 

down into.a sequence 


f 


indus steps 
a 1 

2 : j a 

S 
— 

1.tdentify box 2. Locate amp 

Therobet'sintal Tectetaees 

omntexo fide Tamp to geronee 

Shue bexnits ee 

ee iedentes tl 


44 | PLANNING AND Al 


ENVIRONMENT 
“The robot must climb onto the 
platform to reach the blue box. 


3. Push ramp 
‘The ramp needs to. 
be adjacent to the 

platform, 30 it must 

be pushed into place, 


e 


4. Ascend ramp 
‘The robot can now 
Use the ramp to move 
up onto the platform 
next to the blue box, 


Task 
The robot's 
ig to push the blue 
box off the end of 
the platform. 


5. Push block off 
Once on the platform, 
the robot can push the 
box off the end. Its 
tackisnow complete, 


PLANNING AND Al | 45 


Bayes’ theorem | Theprobabilty of event A 
‘The probability of one event 3 happening given that event 
happening—such at smoke B hes happened. For 


pei 
Soman canoe See 
tetaeoe ae Sieaieeerrcs 
Dee coder z 


frequency of smoke and fires. 


DEALING WITH 
UNCERTAINTY 


Most classical Als are based on the idea that logical statements 
{see p.37) are either true or false—that there is no room for 
Uncertainty. However, uncertainty is an unavoidable feature of life, 
and it can be incorporated into Als using the cancept of probability. 
Probability is a numerical value of how likely something is to occur, 
“Probabilistic reasoning” is any method of reasoning that takes 
probability into account. The English statistician Thomas Bayes 
(1702-61) developed a method, known today as Bayes’ theorem, of 
calculating the likelihood of an event happening. Instead of figuring 
‘out the probability of the event in isolation, Bayes’ theorem bases 
probability on prior knowledge of the relevant conditions. 


46 | PROBABILITY AND Al 


The probabity of event 8 4 | Probobity of venta 
E happening given event A occuring For sample 
| os heppened.Feresemple, “ew often dngjeroes 
“the thethoog hat theres “tres cur 

simoke accompanying a 
dangerous tre 


Probebility of event 8 
_ happening. For example, 
5 how often thereis smoke. 


“Probab 


y theory is nothing 
more than common sense 
reduced to calculation.” 


PROBABILITY AND Al | 47 


gathered, tte 


& | According to thie previous day war 
‘medel there sa sunny, there is 
lower chance thot high chance of mere 


# arainyeay wil sunshine 70%, 


be followed by 


RAIN UNLIKELY 


CLOUDY 


A Markoy chain is a model that describes a sequence of 
possible events in which the probability of each event 
depends on the state that was reached in the previous event, 
The model predicts outcomes based on the rules of 
probability (see pp.46-47) and using data collected on the 
relevant subject. Once it has been trained (see p.61), it only 
needs to know the conditions of the immediate past (the 
previous state) to get the relevant information to predict 
the likelihood of the next state. Markov chains have many Al 
applications, from forecasting weather patterns and financial 
market conditions, to use in predictive text systems. 


48 | THE MARKOV CHAIN 


fg} Tepainis fs tepanis 
3 arene bythe SU ceeniedty 
B i mvcarconatens 3 | utcoratis 
By ands lays the B | and probability, 
Lane shapstor 5 tna vant. 


those conditions. H 


J 7 /\ WN 


Deterministic models Stochastic models 
Ina deterministic model, ‘Astochastic model includes 
there areno random variables, random variables, Results are 
Results from a set of inputs will much less predictable and not 
be related in a predictable way. clearly related to each other 


MODELING UNCERTAINTY 


Stochastic models enable Als to make predictions about 
processes and situations that are effected by chance 
‘events, such as changes in the stock market or the 
growth rate of bacteria, The volatile and ever-changing 
factors in these scenarios are represented by random 
variables, and each is assigned a value based on the 
probability of it occurring, A stochastic model then 
processes thousands of combinations of variables and 
produces a distribution curve that shows the probability 
of different outcomes under different circumstances 


STOCHASTIC MODELS | 49 


AUTOMATED ADVICE 


Computer programs that replicate the knowledge and reasoning 
‘skills of human specialists are known as “expert systems.” The 
information that they contain is supplied by human experts, and is 
programmed into the system by a "knowledge engineer.” Each system 
has three parts. The “knowledge base” contains the facts and rules 
used by human experts on the topic. The “inference engine” applies 
the rules to the facts in the knowledge base to deduce answers to 
queries posed by users. The “user interface” accepts queries from users 
‘and displays solutions found by the system. Expert systems are able to 
answer complex questions and provide users with wider access to 
expert advice. They are usad in many areas, including medicine, where 
they match symptoms to likely causes and appropriate treatments, 


Human experts Knowledge engineer 
Experte supply the ‘The expert system 
knowledge and rules is programmed by 3 
within the system. “knowledge engineer. 


uw 
Z 
Q 
= 
a 
= 
5} 
a 


50 | EXPERT SYSTEMS 


E 
2 
: 


Frinteltigence is not the ability to store 
formation, but to know where to find it.” 


L Abert etnstein J 


User 

The user asks. question 
and gets an answer via 
the interface. 


Inaction 
‘The three sections of an 
expert system interact to 
provide answers to the user. 


EXPERT SYSTEMS | 51 


0 


HANDLING 
“MESSY” DATA 


Classical Als (see p 30) struggle with 
some tasks that humans find simple. 
We can program computers 
for reasoning-based tasks, such 
as playing chess, but not for 
sensorimotor- and perception- 


based tasks such as catching 

a ball or recognizing a cat. wns? 
The Austrian-Canadian Tr} 

programmer Hans Moravec a 

(1948-) argued that reasoning i 

tasks are easy to teach to | Anat canessly 

computers because humans have iene 

already figured out the steps that i 

are required to complete them. | weulastruggeto 


In contrast, sensorimotor 

and perception activities involve 
unstructured, or “messy,” data 
that requires a lot of processing. 
For humans, these tasks are largely 
unconscious actions, refined over 
millions of years of brain evolution, 
but they are difficult to break down 
into a series of steps that a 
computer can follow. 


52 | MESSINESS 


TR 


| for Als, particularly 
| classical modes, 


MESSINESS | 53 


NEATS VS. 


In the 1970s, Al theorist Roger Schank (1946-) 
noted that there are two types of Al research, 
which he called “neat” and “scruffy” (see 
opposite). The neat approach, which has since 
become dominant. builds Als by programming 
computers to follow strict mathematical rules. 
These rules enable Als to distinguish between 
different types of data, and to analyze those 
data by using machine-learning algorithms 
(see pp.58-59). Artificial neural networks 
(ANNs, see p.76), for example, are a triumph of. 
the neat approach. 


Neat Al 
Defenders of the neat 


_appraach argue that Ale 
‘are machines that can 


perform specific tasks Se 
‘with complete reliability. ( 

‘Thay also claim that neat. 

Als will ultimately have 

humantke inteligence, 


‘their cue from physics, 
building Ais whose: 
behaviors precitatle. 


oe 


54 | TWO FIELDS OF Al RESEARCH 


Roget chan gee opposit®) a 
secruftY” approach pyasa 1 inwhic 
et Ses sy xinds mode! 
and goritrns der * esi ans thot 
vetigenc® insky 29-2018) 
descr this yacht aS 1g “anal ica" ener tha 
yoicel NCE aces me a ynat at ny kes US 
peind a be able % nize th certain Joierns af 
ous « comp?" ple) To other POO Ne joel | 
esnouls pave a Kin’ gman sense >| 


()) wi 


oars 
mean 
ee mous 
heat ae anil 
fei uuse there i aneat 
nea rieiegonee: 
Montana ] 


thee 
inyesarel aa 


6 
ilding Als that at fe 


Two 
FIEL 
DS OF Al RESEARCH 

155 


STAT IST 
ARTIFIC 


INTELLI 


In the 1990s many researchers grew frustrated by the 
shortcomings of classical Al, with its focus on logic and 
deductive reasoning, and began developing statistical 
techniques instead. This gave rise to statistical Al, which 
remains the main focus of Al research today, At the heart 
of this approach is a technique known as “machine 
learning.” Machine learning involves using data sets to train 
‘Al models (including models that mimic the human brain, 
known as “artificial neural networks”) to perform tasks 
without requiring an engineer to program them explicitly to 
do so. This approach is thriving today, due to the availability 
of powerful computer hardware and large data sets 


paTiFiciaL INTELLIGE Nc, 


‘Thisis the science: 
‘of developing machines 
that can act and make 
decisions “intolligently". 


AAACHINE LEARN 


Machine learning focuses. 
‘on training computers to 
‘perform tasks without 
the need for explicit 
programming. 


over LEARN 


Deep learning isthe 
‘most sophisticated type of 

‘machine learning, tt requires 

minimal human intervention, 
‘and uses computer models. 
known as “artificial neural 

networks” that are based 

con the human brain. 


Wis predicting the future 
isn’t magic, it’s artificial 
intelligence.” 


TEACHING Als 
TO THINK 


‘Machine learning is a form of Al that enables 
‘computer systems to learn how to perform tasks 
without being explicitly programmed to do so. 
Programmers can write algorithms that tell 
‘computers pracisely which steps to follow to 
complete simple tasks. However, for more complex 
‘tasks, such as recognizing faces or understanding 
spoken conversations, it is incredibly difficult for 
programmers to write the necessary algorithms, 
and this is where machine learning comes in. 
Machine-learning algorithms use collections of 
sample data, known as training data (see p61), to 
‘build models that make predictions or choices based 
‘on new data. There are many kinds of machine 
learning, including deep learning (p.86), in which 
Als mimic the structure and behavior of biological 
brains by using artificial neural networks (see p.76), 


MACHINE LEARNING | 59 


=) GAINING INSIGHT 
] FROM DATA 


Data mining is a process of 
uncovering patterns of information in 
large data sets (sets of information) 
with a view to making the date useful 
for specific tasks. For example, 
data-mining software can scan the 
medical profiles of thousands of 
people to identify those diagnosed 
as being diabetic, and could inform 
them about new treatments for the 
condition. Data mining is a broad 
discipline that increasingly uses Al 
techniques to process volumes of 
data that are too large for humans 
to handle. Two key techniques used 
are “clustering” (see p.68) and 
‘anomaly detection” (see p.69) 


= 
EB 
= 
= 
= 
i 
= 
EB 
Es 
= 


A EE 


VALIDATION DATA. 


TRAINING DATA TEST DATA, 


v usine Data 
Training data is used to 
teach an Al, during which 
PREDICTIONS validation data is used to 
monitor its accuracy. 
After teaching, the AL is 
assessed using test data. 


TEACHING MATERIALS 


Training data is a type of data that is used during machine learning 
(see pp.58-59) to teach Als how to perform tasks accurately. It is used 
by programmers to test, adjust, and fine-tune the Al (see pp.78-79) 
Until it gives the expected results—or outputs. “Validation data” may 
also be used to assess how accurately the Al processes the training 
data during the learning period. Once the Al has been trained, “test 
data’ is then used to assess the accuracy of its results. Machine 
learning requires a large amount of training data, which may 
be labeled or unlabeled (see pp.62-63). 


TRAINING DATA | 61 


FEATURES 


LEAF — cunveo 

sweer 

share TASTE SKINS oe) 

co 

eer SCENT SEEDS intone 

rns 

scm ines 
FLESH © icy 


Tagging features 
Ahuman operator tags all of the 
doteribing the features of “typo A 
apples. The Al learns that. together. 
these features definea “tyne A” apple, 


2 


GIVING DATA MEANING 


A “feature” is a characteristic, such as a pattern of pixels, that 
an Al can use as an input to predict a label, which becomes the 
output. In supervised machine learning (see p.72), Als learn to 
associate particular features with labels by processing training 
data sets (see p.61) that have already been labeled by a human 
operator. For example, if an image-recogrition Al trained with 
labeled photographs of animals is input a photograph of an 
animal with features such as white feathers, curved beak, 

and crest, it will probably output a label of “cockatoo.” 


62 | FEATURES AND LABELS 


LABELS 


FRUIT DATABASE 


Taste: sweet, sharp 
Scent: fresh, tart 
Size? inchos/Sem 
Leaf: curved 

‘Skin: smooth, red 
Seeds: inedible 
Flesh: firm juley 


Predicting labels 
Knowing all of the features of 3 

‘type A” apple, the Alcan find it 
ing fruit database, and identify 
it with the labal “Apple A” 


“A baby learns to crawl, walk, and 

then run. We are in the crawling 

stage when it comes to applying 
machine learning.” 


FEATURES AND LABELS 


BO re 
100G iXivd 5004 


64 | PATTERN RECOGNITION 


YES OR NO? 


‘A decision tree is a model of the decision-making process 
used by Als. It works by questioning data, to which the answers 
can only be “yes” or “no”. One kind of decision tree |s the 
“classification tree." By repeatedly posing yes/no questions, 
the Al splits the “root” (data set) into ever-smalier “branches” 
(subsets) that share particular features, until a single “leaf” 
(conclusion) is reached, pinpointing a specific classification 
within the data. Decision trees are commonly used in both 
machine learning (see pp.58-59) and data mining (see p.60), 


{is thostarting peint 
‘ofthe decision tree, 


‘The weedivides inte 
branches as questions 
ore asked of the date 


love 
When the anseet 


theyor/no question i 
no," the branch ends. 


From root to leaf 
Adecision tree branches 

ut through e series of 

“nedes" that represent Nitsraseres of 
theanswerstoyesino — Sonvuges st tea. 
questions, unt leat, 

fr conclusion is reached. 


DECISION TREES | 65 


TYPES OF DATA 


An algorithm that assigns labels to items (see pp.62-63) 
and then sorts them into categories, or “classes,” is known as 
a “classifier.” Through a process of supervised learning (see 
p72), Als are taught to classify items using a labeled training 
data set (see p.61), from which they learn to recognize the 
patterns associated with different labels. For example, 2 
spam filter is taught to detect features of spam and 
non-spam emails from a collection of labeled emails. Based 
on this training data, the Al can automatically assign the 
labels "spam” or “not spam’ to new emails, 


Dividing data sets DATA CLASS A 
Classifiers can separate 

data nto regions divided 

byalline known as a 

decision boundary. 


66 | CLASSIFICATION 


knows a person's 
‘weight, itcan prodict how tall \ REGRESSION LINE 

| they are based on the heights eae 
‘ofall known people of the Guanatens 

P sameweight: between te height 
S)_ardieight vaviables 


height and weight of 
large number of people 


THE LINE OF BEST FIT 


Regression analysis is a machine-learning process 
(see pp.58-59) in which an algorithm is used to predict the 
behavior of one or more variables depending on the value 
of another variable. It is used in many supervised learning 
applications (see p.72), particularly those that are designed 
to find causal relationships between several variables, For 
‘example, it can be used to predict what the next day's 
temperature will be given today's humidity, wind speed, and 
atmospheric pressure, and data about how all four variables 
have behaved in the past. “Linear regression” (see above) is 
the most common form of regression analysis, and is used 
particularly in the fields of finance and economics. 


REGRESSION | 67 


GROUPING DATA 


Clustering is the process of dividing a data set into a 
number of groups based on commonly shared features. 
It isan unsupervised machine-learning technique (see p.73), 
which means that it is performed by Als on raw, 
unlabeled training data sets. Clustering |s especially useful 
for gaining insights into human behavior. For example, a 
company may use it to sort its customers into distinct 
groups, based on their purchase historles, so that it can 
target them more effectively with promotions. 


4 Honing research 
‘Clustering enables 
researchers to target 
‘specific groups of items 
for further analysis. 


£ Unelustered Remscan be 
grouped into theirown, 
| miscelaneous, cluster 


E 
3 


68 | CLUSTERING 


Cluster analysis: 
Als can identity 
‘anomalies ina 
data set by using 
clustering (cee 
opposite). 


points eutsideof 
cluster a8 anomalous, 


- 


THE ODD ONE OUT 


Anomaly detection is the process of identifying unusual 
(or “anomalous*) data in a data set. That is to say that the Al 
looks for items that do not fit a particular pattern or model 
built from its training data. Many anomalies are caused by 

mistakes in the data, such as incorrectly inputted units, or an 

Inconsistency in the type of measurement used. in such 

cases, it is important to find the anomaly so that it can be 

corrected or removed from the data set, However, anomalies 
can also draw attention to serious problems that lie outside 
the data set, such as a software malfunction or the Al being 

hacked by cybercriminals (see pp.96-97). 


ANOMALY DETECTION | 69 


THE MOST 
LIKELY 
OUTCOME? 


Machine-learning models 
(see pp.58-59) can make 
predictions by analyzing 
patterns in historical data 
In Al a prediction is the 
‘output from a model that 
forecasts the chances of 
2 particular outcome, For 
‘example, if a customer 
buys a certain item online, 
an Al can use data about 
past purchases—both 
from the customer and 
from others—to predict 
what other items they 
might want. Prediction in 
Al does not always involve 
anticipating a future event. 
It can also be used to 
make “guesses” about 
‘events in the past and 
present, such as whether 
a transaction is fraudulent 
(see p.98), orif an X-ray 
indicates the presence 

of disease (see p.102). 


70 | PREDICTIONS 


Accustomor purchases a product 
from an online vender—for 
example, a toothbrush. 


& natsi 
Ady 


The Al builds 2 profile of a 

customer by analyzing their 

‘online behavior and history 
f purchases, 


{ti Ale 


Similar items: 


The Al identifies othor itome 
‘frequently bought alongside the 
product—both by the customer 


‘and others. 


&. 


se 
Been 


Similar profiles 


‘The Al compares the customer's 
profile to2 large number of 


other profiles to fing 
similar matches. 


Prediction 
‘The Al predicts and then 
recommends linked items that the 
‘customer may want—for example, 
toothpaste and mouthwash, 


Prediction 
‘The Al uses the purchase history 
of similar profiles to predict other 
Items the customer might be 
Interested in. 


PREDICTIONS | 71 


MACHINE LEARNING 
WITH “LABELED” DATA 


‘Supervised learning is a type of machine learning in which 

an Ais trained using a “labeled” training data set (see p.61). 
Input and output data is labeled by a human so that the Al 
can learn the relationship between them. The inputs, outputs, 
and the rule that relates them are collectively known as a 
“function”. During training, weights (see p.78) are adjusted 

‘to make the function fit the training data, The resulting 
function can be used to predict outputs based on new 

inputs. Supervised learning can be used for classification 

(see p.66) and regression (see p.67). 


INPUT DATA OUTPUT 
PY 4 se 
o2r@ oe 

ag sp =P Ps xs 8 


@ 
@ cuitaton DD 


The Alis able to clacsify 
inputs — for example, 
different fruits - froma 
large, unlabeled data set. 


72 | SUPERVISED LEARNING 


INPUT DATA PREDICTION 


2 s 
+ MODEL 
LABELS 4 
@ 
Laboled data sot Test data Prediction 
The Al istrained using After training, the Al is On the basic ofits 
labeled data - for ‘tested on unlabeled training data, the Al 
‘example, that the inputs Input data to check predicts the label of the 
are apples. its performance. inputis an apple. 


MACHINE LEARNING 
WITH “RAW” DATA 


Unsupervised learning is used to discover hidden structures 

in raw, unlabeled data sets, Although Als do not understand 

the relevance of these structures, they may still have real-world 
meaning. This approach is useful in the early stages of data mining 
(see p.60), to find patterns in large, unlabeled data sets, which can 
then be subject to human interpretation. An in-between method, 
semi-supervised learning, uses partly labeled data sets, which gives 
better results than entirely unsupervised learning. 


UNSUPERVISED LEARNING | 73 


LEARNING FROM 
FEEDBACK 


Reinforcement learning is an approach to machine learning 
(see pp.58-59) in which an Al is taught to perform a task 
through trial and error, To achieve this, the Al is programmed 
to recagnize “rewards” and "punishments," meaning positive 
‘or negative feedback, depending on whether it succeeds or 
fails. The Al learns that succeeding Is good and falling is bad, 
and repeatedly attempts the task until it is rewarded. For 
‘example, an autonomous vehicle trained in this way 
(see p.122) will be punished—receive negative feedback— 
until it learns not to go through a red traffic light. 


FEEDBACK: 
REWARD OR PUNISHMENT. 


Trial and error 
The Al learns te succeed in a task 
through the consequences of ite actions. 
It will seek rewards and avoid 
punishment until the taskis completed, 


74 | REINFORCEMENT LEARNING 


MODEL B 


MODEL C 


PREDICTION 


WORKING TOGETHER 


Ensemble learning is based on the idea that combining the outputs 
of multiple machine-learning algorithms produces a better result than 
a single model can. Using two or more models that have been built 
and trained in different ways, for example with different data sets, can 
“cancel out” their individual weaknesses and generate more accurate 
predictions. Ensemble learning can be used to “teach” a particular 
model to improve its predictive performance, but also to assess a 
model's reliability and prevent a poor one from being selected. 


ENSEMBLE LEARNING | 75 


INPUT HIDDEN 
LAYER LAYER 1 


PROCESSINGLAYERS 


dotais processed 
withinan ANN. 


known 35the 
hidden layers 
nese are where 


Information, such 
fs data, enters 
‘an ANN via the 
Teput layer. 


THE Al BRAIN 


Artificial neural networks (ANNs) are machine-learning 
models based on algorithms (see p.14). Their structure is 
similar to that of the brain, consisting of interconnected 

nodes—artificial neurons—that are organized into multiple 

“layers”. The nodes within each layer receive, process, and 

send date to the next layer in the network, until an 
‘output, or result, is produced. Each node works like an 
individual microprocessor that can be reprogrammed 
to handle the data ina desired way. Using training data 
(see p.61), programmers can teach an ANN to “learn” 
how to give the expected results, or outcomes. 


76 | ARTIFICIAL NEURAL NETWORKS 


input HIDDEN. ‘ourpur 
LAYER LAYER LAYER 
Input layer Multiple hidden layers Output layer 
Theinputlayer brings _Datais processed within the “hidden” ‘The processed data 
‘the initial data into layers, passing through the network, leaves the network 
the network. from one layer to the next. via the output layer 


NETWORK STRUCTURE 


Artificial neural networks (ANNs) are structured in “layers’— 
collections of processing nodes that operate together. Data flows 
from the nodes in one layer to those in the next. The first layer always 
contains the “input,” or incoming data. Next Is at least one “hidden” 
layer, in which the processing takes place. These layers are hidden in 
the sense that their data is not visible to a user in the way that the 
network's inputs and outputs are, Finally, the resulting data arrives at 
the “output” layer. All ANNs share this basic structure, but some are 
more complex: recurrent neural networks (see p.85) generate 
connections between nodes in the next or in previous layers, while 
deep neural networks (see p.86) can have hundreds of hidden layers. 


LAYERS | 77 


ASSIGNING IMPORTANCE 


Al algorithms include variables—mathematical values 
that can change—that determine how data is processed. 
‘within an artificial neural network (ANN, see pp.76). 
‘When designing and training an ANN, programmers can 
give these variables greater or lesser influence within 
the algorithm. This influence is known as “weight.” The 
more weight an input has, the greater its influence over 
the output. The “bias” (see opposite) determines the 
threshold at which variables become significant. 
Adjusting the weights and bias allows the ANN to 
be fine-tuned to give more accurate results. 


SHOULD I SNACK ON AN APPLE? 


inpurs weienrs Thebia removed 


aq = 


evAitiatiON, 


3 5 Apestive | 
tevenanwer 5 B maipccots —“heenmerto 

A moarancetoghe —“thequeston 

z 3 i Nageedsere wryens 1 


78 | WEIGHTING 


GOALS AND 
THRESHOLDS 


NOT ACTIVATED 


“The value of the node's 
‘output falls below the 
a bias, so nothing Is 
Value ofthe forwarded to the next 
node's calculation, ajar ot tbe ANKE 


ACTIVATED. 


w 
a 
g Theale ot noses 


An artificial neural network (ANN, see p.76) is made up ofF 
layers of “nodes,” which receive and process data, Before 
a node can pass information on to the next layer of 
nodes, its output data must reach a certain value. This 
value—essentially a numerical score set by the ANN 
designer—is known as the “bias.” The node can only 
activate" and pass on its output data once the bias has 
been met. If the node is not activated, that path of data 
transmission stops. Different biases can also be set to 
direct data to specific nodes on the next layer of the ANN. 


BIAS | 79 


Cost function 
The difference between the 
expected and actual outputs gives 
‘the mode's performance. The goal 
is for them to be equal. 


MEASURING SUCCESS 


The performance of a machine-learning model, such 
as an artificial neural network (see pp.76), can be evaluated 
by its “cost function.” This is a measure of the change that 

occurs during training between the actual outputs from the 

model and the outputs expected by the programmer. This 
difference, called the “cost,” is expressed as a number. The 

higher the number, the greater the gap between the real and 
the anticipated outputs, and the poorer the model. As the 
model learns, the cost reduces and performance improves. 
The training is complete when the cost is zero, or as close 

to zero as possible. 


80 | COST FUNCTION 


IMPROVING PERFORMANCE 


A machine learning model improves its performance by 
fine-tuning its settings. Instead of having to process huge 
amounts of data, the mode| can start at a random data point 
and then "nudge" its way toward a better solution. The 
algorithm that trains it to do this is known as the “gradient 
descent.” Each time the model adjusts its settings, the 
gradient descent rates its success using the “cost function” 
(see opposite). Plotting the gradient of the cost function on a 
graph reveals a curve. The model reduces its cost function by 
following the steepest downward slope, When the slope levels 
Off, the model is as good as it can be and it stops learning 


Trial and error 
Using feedback from a 
gradient descent algorithm, 
the model tweaks ite 
settings until it reaches 

Its best performance. 


AF the cost 
“function 
Inereases, the 
model alors 
small step 
backward, 


‘Ateach point. the 
gradient descent tlle 
the model how wellit 


Thelowest porsible 
costunction where 
the gradient iszero, 


DESTINATION 


GRADIENT DESCENT | 81 


REFINING 
THE MODEL 


The "delta rule’—also known as the delta 
Jearning rule—enables a single-layer artificial 
neural network (ANN, see pp.76) to boost its 

performance by refining its settings. It makes use of 


"gradient descent” (see p.81) to identify the best 
choices for improving the model. As the model’s 
itpuit gets nearer to the expected output, smafler 
adjustments are certied out, until the outputs are as 
close as possible to each ottier. Backpropagation 
(see p.84)is a generalized form of the deita rule that 
applies to ANNs with any numberof layers. 


82 | THE DELTA RULE 


x 
8 


ian 


INFORMATION FLOWS 
FORWARD ONLY 


[= 


A ONE-WAY 
NETWORK 


A "feedforward neural network” (FNN) is 2 simple artificial neural 
network (ANN, see p.76) in which information flows forward 
only—from the input layer, through the hidden layers, to the 

output layer. The connections between the nodes in en FNN do 
not form “feedback loops*—in other words, autputs are nat fed 
backward as inputs, as they are in a recurrent neural network 
(RNN, see p.85). The most basic form of a feedforward neural 
network is a single artificial neuron (see p.21), which can undergo 
machine learning using gradient descent (see p.81). 


FEEDFORWARD NEURAL NETWORKS | 83 


INPUT HIDDEN 
LAYER LAYERT 


‘The sigosthem mevee 
backward through the ANN, 
fine turning layer by layer 


FINE-TUNING DATA 


Backpropagation is a type of algorithm that is used to train artificial 
neural networks (ANNs, see p.76), specifically feedforward neural 
networks (see p.83). It is known as backpropagation because it begins 
at the final (output) layer and moves in reverse towards the first (input) 
layer. During this process, nodes are reprogrammed by adjusting their 
weights (see p.78) and biases (see p.79), using gradient descent 
(see p.81) to find out whether increasing or decreasing them will 
produce better results. This has the effect of fine-tuning the ANN 
to produce more accurate outcomes overall. 


84 | BACKPROPAGATION 


INPUT HIDDEN 
LAYER LAYER 1 at rf | 


backward through the 
ANNin fescack loops 


STRUCTURED DATA 


A recurrent neural network (RNN) is a type of ANN in which data 
can move backward in a “feedback loop”. RNNs are used to process 
sequential data—data that has to be in a specific order—such as 
language. While traditional ANNs process individual date points to give 
an outcome, RNNs maintain the essential structure and relationships 
within sequential data, so that it remains intact. In doing so, RNNs can 
be used to predict the next output of a sequence. They are used 
widely in natural language processing tasks (see pp:112-13), including 
training virtual assistants to carry out spoken conversetions. 


RECURRENT NEURAL NETWORKS | 85 


OOO0Gs 
OOO0O 
BOOOOO 
IOO000 
e@eo0 
OOOO00 
eeeoee 
| @ e@ 


86 | DEEP LEARNING 


BUILDING A BRAIN 


‘suogoipaid se upns ‘eyep mou 
ayeinnse aye319 01 evep Bunsixa 

‘95n UeD 3] ‘SplOMA JYRO Ul “eyep 

{2} GupujuoD ayew ue> sore12u06 
249 1uN sonuquos ssas0sd siy 
‘AjaAnIeYe B10 Eyep axey AyUAPI OF 
‘ujebe sou 3 ‘sie} 40}eUILULDSIp 249 4} 
“eyep jeas Wows Ysin6us 
5172p eJep a4ey Huger. 
sojeseuab au 'spae29ns 3 4|"e1ep 2He} 
24} Ayquap! 0} StUIE JoyeUILULDSIP 

SUL JOREURULDSIP, 243—NNV 
puoras au) 07 
a2), mou 932019 03 ‘oUIUUesBosd ayy 
‘Aq payddns ‘(,9°d aas) eyep Buywien. 
pojaqeyun sasn ‘,J0jev0u96,, 343 ‘NY 
UO “(fd 28s SNNY) StioMIaU jesnaU 
Jeune Buneduio> ona sesn yey 
Japous Buwes|-auyrew e st (NYO) 
juomgou jeuesianpe onnes0u96 y 


IV SA IV 


“seanoid atu) 
Sujuseay ‘spaaaons 3 nun ube 
ep 972} a4 PayAueD! 

Se JO.EULUNZSIp aut 


Bujuseo] auyreH 


auyauap!30u 
sreaep axed 
penods 30N 


“paynuap 
sjeep ayes 


panods 


~ 


-evep [e019 UAE 
erep a4 943 AaH9pL 
045919 NNV SIL 


"yey 5130 Ayjenb 
a4 Buynoudus Ayjenpes5 ‘wee 
921 11"yGnoue BurauiauD> zou 


GENERATIVE ADVERSARIAL NETWORKS | 87 


PROCESSING 
VISUAL DATA 


A convolutional neural network (CNN) is a type of deep neural network 
(see p.86) that is similar to the structure of the visual cortex: the part 
of the brain that takes and analyzes information from the eye. CNNs 
are effective tools for computer vision (see p.110), since they can be 
taught to recognize features in input images, such as the pointed ears 
of cats. There are three types of layers (see p.77) ina CNN. The first 
type performs a function called a “convolution,” which allows features 


Input Convolution 

‘The input in CNN is: ‘A filters applied to the image 
typically an image—such to produce feature maps. This 
‘35 photograph of a cat. tnables features to be detected. 


in pattere of pivole 


| get very excited when we discover a way of | 
making neural networks better—and when that's 
closely related to how the brain works.” 

[Sette Hinton af 


88 | CONVOLUTIONAL NEURAL NETWORKS, 


in an image to be detected. These layers first extract low-level features 
(lines and edges), before extracting higher-level features (shapes). They 
work by passing a filter over the image that creates a “map” of the 
location of each feature on the image. Between each convolution, 
there is a “pooling’ layer, which reduces the complexity of the feature 
maps. The data from these layers is flattened, and then passes through 
a “classification” layer (see p.66), which identifies and labels the image. 


Pool Classification Output 
“Mess” is cut out to reduce ‘Througha process of The Alidentifies 
the amount of computing, classification, the Al associates the photograph 
power required, and the ‘the data from the previous layers as being one 
Featuras aro abstractad. with an image of cat 


CONVOLUTIONAL NEURAL NETWORKS | 89 


USING 
ARTIFIC 


INTELLI 


USES OF AI 


NRAVELI 


UNDERSTANDING WoRDS 
(SEE PP.112-113) 


INTELLIGENT ROBOTS 
(GEE Pate) 


92 | APPLICATIONS OF Al 


i; a | 


“A world run by 
automatons doesn’t 
seem completely 


unrealistic anymore.” 


USES OF Al 


Whether via mobile phones or virtual 
assistants, we interact with Al applications 
everywhere—often without realizing it. 
Using these applications has changed the 
way we work, shop, and communicate, 
and has revolutionized many industries, 
including finance, healthcare, and 
agriculture. Other Al technologies, such as 
generative Al and autonomous weaponry, 
are still in their infancy, but these too will 
soon be in mainstream use 


APPLICATIONS OF Al 


3 


RANKING 


‘When an internet search engine is used, Al-generated 
rankings determine which sites appear most highly in the 
results, Same ranking algorithms locate and rank websites 

that contain the same terms, or “keywords”, as those entered 

into the search engine by a user. Those with the closest 
matches rank highest. Other algorithms rank websites more 
highly if they are accessed from many other sites, or if they 
are especially popular. 


94 | DATA HIERARCHIES 


GQ Fee 


RECOMMENDATIONS, 
The algosthe 


recommerdh seected 
sees to the user 


ss = 
= & 


RECOMMENDING 


Based on an internet user’s browsing history, and that 
of others, Al recommendation algorithms can suggest 
websites, as well as products, that may interest the user. 
This can involve suggesting similar content to what the 

user has viewed previously or offering sites that similar 

web users have visited. To do this, algorithms make 
predictions (see pp.70-71). For example, if an internet user 
searches for advice on dog care, the Al algorithm will predict 
that they have or want a dog. It then searches the internet 
to find popular sites, and products, associated with dogs. 


torecommend 


TAILORED CONTENT | 95 


GT, 
ct \N 4, IR 
ae & 
x” Traditional threat: oe 
detection software used by 
cybersecurity experts searches for 
known malware (see opposite) “signatures,” 
blocks the malware files it detects, and raises 
alerts. Incorporating an Al into the system 
enables cyber defences to identify and categorize 
new and mutated threats ("zero-day* malware) 
that would otherwise be undetectable since they 
do not match any known signatures. This is a vital 
development in cybersecurity, given the speed 
with which new threats arise. Als are also used 
to predict how and where a system might be 
breached, and to help respond to breaches. 


Log Action 

‘The Al logs the malicious ‘The NI performsan action, 
attack for futuro use—eithor such a raising an alarm or 
as historical data or to the blocking traffic, when an 
Istof signatures, attack has been detected, 


INTRUSION DETECTION 


Anomaly detection 

[An Al can monitor incoming potential 
threats by identifying unusual traffic that 
does not match patterns in historical data 


Signatures 
Potential new threats can also be compared 
against alist of attributes, or signatures, 
that are known to belong to malware, 


96 | CYBERSECURITY 


soe I 
Hacking . a ‘ 
areata ines of ee 
gece orn Suaeennne| 
to access other people's ransomware is designed © 


digital information is 


toffind and encrypt files 
known as “hacking.” — yet aan 


ona device, making them 
inaccessible until a 
ransom ie paid. 


- 


5 


Malware 
Short for “malicious 
fF software*malwareis 
jes anycomputer program mmm 
that damages a device or 
sins access to sensitive 
Denia-of-service % % information 
Adenial-of-service 
(©o5) attack floods 
memes Fervor with data, 
overwhelming it to the 
point at which itcan no 
Tongerfunction, 
S 


* 


Disinformation 

Using the internet, 

‘enemy agents can spread 

memes falsenews storiesto 
influence public opinion, 

create instability, and 


(@) N LI N E , strup social unrest, 
ATTACKS | x 


The use of cyberattacks to target a nation state is known as “cyber 
warfare,” It is possible to inflict serious harm on a country remotely, 
disrupting key services and critical infrastructure such as power arids 
by disabling the information systems that control them. Cyber warfare 
tactics include denial-of service (DoS) attacks, malware such as viruses 
and ransomware, disinformation campaigns, and state-sponsored 
hacking. Al is used in cyber warfare to enhance these attacks, making 
them faster and more sophisticated, For example, Al-driven malware is 
very hard to detect: itis able to use machine learning (see p.58-59) to 
find weaknesses in 2 device's security system, attack it while posing as 
an accidental error, and then cause harm to the device. 


a. 


CYBER WARFARE | 97 


—. 


eon 


patternsin 
‘eustomers 


i 
| an Alcompores 
= purchating 


DETECTING FRAUD 


Financial institutions are adopting Al systems to detect—and 
prevent—fraud. These systems can process vast amounts of 
data about past transactions, learning the ordinary patterns 
of behavior of a bank's customers. When transactions are 
made that do not fit this pattern (see p.69), an Al may flag 
‘them up as needing to be investigated or take other actions, 
such as freezing the customer's account. An Al may score 
each transaction on its likelihood of being fraudulent, then 
raise an alert when this score exceeds a certain threshold. 


98 | TRANSACTION MONITORING 


Ael 


AI IN FINANCE 


High-frequency trading (HFT) is the use of specialized 
algorithms to make investment decisions and transactions 

‘at superhuman speed—performing millions of trades each 
day. Some financial institutions manage entire investment 
portfolios using HFT. By evaluating vast quantities of market 
data in real time, it can identify the best stocks and shares to 
buy and sell, identify the optimal time to place those deals, 
and perform transactions extremely quickly. To help inform 
its decisions, HFT may use natural language processing (NLP, 
see pp.112-113) to analyze news reports and social media. 


snOKE; EXCH, 
gRokeg orca, 


Oo 


MFT ALGORITHMS 


Wet are oftenstee oes 
pyscally close tostock ——— 
markets toreduce any ——— 
Aelay in tele abu to —— 
make transactions —+ 


ALGORITHMIC TRADING | 99 


> 
Sa, 
3 


JEDICAL RESEARCH 


Als not only speed up tedious work, they help 
‘open up new fields of scientific research. For 
example, using deep learning (see p.86) and 
painstakingly collected experimental data, scientists 
have taught Als to predict the 3D structure of 
“folded proteins"—the building blocks of life—with 
atomic precision. Previously, scientists could not tell 
how a protein's chemistry determined its folded 
structure. This “protein folding problem” was so 
complex that it remained unsolved for decades. 
‘Today, understanding how these proteins work has 
transformed medical research and accelerated the 
process of developing new drugs. 


| 


@ @B 
DIGITAL 
DOCTORS 


Jol for assisting 
doctors. ing, and especially GeeP 
Toarning (see P86). Hes POT effective at 

ease in medical imagery, 


of lung cancer On 
problems 


owerful tor 


otes using PI 

of patients’ eyes. AL'S also used to 
aentify people at High risk of 
Certain conditions, prontlze 

'ses, and help doctors 


urgent €2 
to select treatments 


10: 
2 | Al IN MEDICAL DIAGNOSIS, 


MONITORING HEALTH 


Als perform an important role in a new field of medicine known as 
telehealth. By wearing sensors that monitor vital bodily functions, such 
5 oxyaen intake and blood pressure, a person can go about their day 
knowing that if a sensor detects a problem, it will send a signal to their 
digital assistant (an app on their phone or personal computer), which in 
‘turn—via the internet—will alert an Al ata healthcare center. This Al 
will then compare the digital assistant’s report with previous data 
about the person and alert a physician if necessary. Crucially, this 
technology can detect problems that a person may not even be 
aware of. More generally, Al technologies can also be used to 
monitor people’s general fitness and well-being. 


Al AND HEALTHCARE | 103 


INTERNET OF THINGS 


The “Internet of Things” (IOT) is the network of interconnected 
devices that collect and exchange data via the internet—not only 
phones and computers, but also smart refridgerators, driverless cars, 
fitness monitors, security cameras, and tens of billions of other items. 
Due to the vast amounts of data that these devices collect—and the 
requirement that they respond appropriately to their users and 
environment—Als have become integral to the IOT. A smart energy 
meter, for example, may use an Al to identify patterns in a user’s 
energy consumption and suggest adjustments to reduce their bills. 


INTERNET 


CAMERAS 


LIGHTING 
AND HEATING 


‘Smart home 
Domestic appliances 
and even heating and 
lighting systems are 
neressingly internet 
enabled, extending 
the reach of the IOT 
throughout the home. 


APPLIANCES 


mosme Pc 


104 | CONNECTED DEVICES 


e 
@ 
| md 


NETWORK 


SENSORS / ALERTS 


Embedded learning 
The Al learns within the devic 
using data collected by its sensors, 


SMART DEVICES 


The “intelligence” in the Internet of Things 
(1OT, see opposite) is mostly contained within 
clouds—remote computing systems usually owned 
by technology companies. Increasingly, however, Al 
software capable of machine-and deap learning Is 
being embedded in devices, such as mobile phones 
and smart watches. Using embedded Al removes 
the need to send data to and from the cloud 
continuously, reducing power usage, data 
processing time, risk of data breaches, and reliance 
‘on cloud providers. In real-time monitoring devices 
(see p.106), embedded Al allows almost instant 
detection and response. 


EMBEDDED A! 


10: 


MONITORING SYSTEMS 


‘The “Internet of Things” (see p.104) enables Als to monitor 
all kinds of equipment automatically, up to major infrastructure 
systems, such as gas pipelines, transportation networks, and electricity 
grids, Sensors distributed throughout these systems collect and 
transmit their data to Als, which then scan the data for anomalies 
(see p.69) and alert human technicians to investigate them further if 
necessary. Als are also used to predict where faults could occur in the 
future, enabling technicians to take action to prevent equipment 
failure. Such measures minimize the disruption caused by using 
complex equipment that needs regular maintenance, 


Sensing a leak 

Having detected 2 leat, the 
Alruns through a decision tree 
(see p.65) that instructs it te 
turn off the nearest valve 
upstream of the leak 


eure 


SENSOR1 SENSOR] seMson] 
~ ~ ~ 
WATER PIPE 


CONTROL 
ROOM 


be $$+ 0 > 


Monitoring pressure 
Sencare manitar the pressure within a water pips and wirslesely 
transmit their data to an Al. Here, the Al detects an anomaly; the 
pressure i lower than it should be between sensors 2 and 3. 


106 | Al AND INFRASTRUCTURE 


Y 


Al cloud Analysis 
‘When it has Finished ‘The farmer downloads 
scanning the field, the tthe information and 
robot transmits Its sees which crops need 
data tots cloud. special attention, 


Crop scanner 
ground-based 
robot scant a 
farmar’s field. Using 
lasers and cameras, it 
compiles a 3D image 
of the entire crop. 


“SMART” FARMING 


Alisa key technology in "precision agriculture’—an approach to 
farming that optimizes the use of water and other resources in order 
to increase yields and minimize waste, Using devices such as drones in 
the air and robots on the ground, which collect data analyzed by Als, 
farmers can receive real-time information about their crops, enabling 

‘them to know which of them require water, pesticide, or fertilizer 
at any time, Such precise methods of farming may become 
indispensable in the coming decades, when the global population 
is set to increase by two billion people. 


PRECISION AGRICULTURE | 107 


<0ucy 


sich siete 


ase Hepa 


SENSORY Al 


Akey aspect of human intelligence is the ability to perceive the 
world through sight, hearing, touch, smell, and taste, Machine 
perception is the ability of computers to sense their surroundings 
Via dedicated hardware (such as cameras and micraphones), and 
to interpret the collected data and react appropriately, This allows 
computers to receive information from sources other than a 
keyboard and a mouse, which is a step toward aligning Al with 
human intelligence, Machine perception, which is vital for 
‘embodied Al (see p.118), includes computer vision (see p.110), 
machine hearing (see opposite), machine touch, machine smelling, 
‘and machine taste. 


108 | MACHINE PERCEPTION 


TRAINED 
NEURAL NETWORK ouTpuT 


“lit — 
o abut — yw 


Will] __ prem 


—— si fff 


‘A neural network trained through 
labeled or unlabeled data (see 
pp.72-73) can identify different 
Instruments in the mixture of sounds. 


PROCESSING SOUND 


Machine hearing is the ability of a computer to sense and process 
audio data, such as music or human speech. This interdisciplinary field 
employs both classical (see p.35) and statistical (see p.57) approaches 
to Al. Engineers developing machine-hearing technolagies attempt to 

replicate the abilities of the brain that people typically take for granted, 

such as focusing on a specific sound amid backaround noise. Speech 
recognition is a complex subfield within machine hearing. It aims to 
comprehend meaning in spaken language, often using 
deep learning (see p.86) to train models, 


MACHINE HEARING | 109 


‘Computer vision is the ability of a computer 
to recognize images and videos—for example, to 
understand that a certain arrangement of pixels 
is associated with a picture of a cat. Engineers 
working in computer vision alm to automate the 
tasks performed by biological visual systems, such 
as the human eye and parts of the nervous system, 
The rise of deep learning (see pp.86) using 
multilayered artificial neural networks (ANNs, see 
76) and the availability of very large training data 
sets online have greatly advanced the field. 
Computer vision is used in many areas, including 
facial recagnition (see oppasite) 


110 | COMPUTER VISION 


car 


FACIAL RECOGNITION 


Facial recognition is a form of computer vision technology 
(see opposite) that matches photographs or videos of human 
faces to those stored in a database. An image of a face 
is captured, and its distinctive features, such as 
the distance between the eyes, are mapped to 
create a unique “faceprint” that is then compared 
with known faceprints, Facial recognition is mainly used for 
security, such as an authentication process on phones, and law 
enforcement, such as to 
identify someone from 
a database of known 


offenders. 


FEATURING MAPPING | 111 


UNDERSTANDING 
WORDS 


The ability of computers to “understand” and 
generate natural language—that is, language as it is 
typically spoken and written by humans—is a key 
‘element of mimicking human intelligence. This idea 
lies at the heart of the Turing test (see pp.130-131). 
Natural language processing (NLP), the research field 
dedicated to developing this ability, brings together 
Al, linguistics, and other disciplines. In the 1950s, 
researchers tried to emulate “linguistic intelligence” by 
providing computers with collections of handwritten 
language rules. More recently, the explosion in 
computing power and big data (see p.33) has enabled 
machine learning—particularly deep learning—to be 
integrated into NLP with impressive results. Among its 
many applications, NLP is used in machine translation 
(Gee p.114) and virtual assistance (see p.116). 


Elements of NLP 
There are five elements to 
NLP, which involve arranging 
letters into words and 
Interpreting the intended 
‘meaning of sentences. 


112 | NATURAL LANGUAGE PROCESSING 


Lexical analysis 
Lecial analysis involves 
structuring an example 
cf natural language into 


‘Tho application ofthe formal Semantic analyse tho procots 

rules of orammar to ‘of determining the literal 

natural language is known meaning of the wordsin an 

_as “syntactic analysis.” ‘example of natural language. 

Discourse integration Pragmatic analysis 
ay ‘The meanings ofconeecative Pragmatic analysis goes beyond 
| sentences are considered tetera meaning of re words 
we ‘together to give context. and attempts to interpret their 

| towerds and phrases. intended meaning. 


NATURAL LANGUAGE PROCESSING | 113 


Al INTERPRETERS 


Machine translation (MT) is the use of Al in the automated 
translation of text or speech from one language to another. Translation 
isa far more complex and subtle matter than simply substituting each 
word for its equivalent in another language. Consequently, MT is 
currently used more as a tool than asa replacement for human 
translators, There are three broad approaches: “rule-based MT” relies 
on linguistic rules, such as grammar and syntax; “statistical MT” uses 
‘the known relationships between words to predict whole sentences 
and phrases; “neural MT” uses artificial neural networks (ANNs, see 
p.76) trained to understand languages almost as well es people do. 


MACHINE TRANSLATION IN ACTION 


Rule-based MT 
)) This approach gives a quick but basic 
translation. Text and speech can be 
understood but often require 
further editing, 


‘Statistical MT 
‘This approach predicts words 

‘and sentences, and may not be fully 
accurate. The translated text often 
till equires furthor editing 


\ Neural MT 
) Atrained ANN ic accurate and can 
be constantly improved. Training an 
J ANN requires huge amounts of data 
‘andis very costly. 


114 | MACHINE TRANSLATION 


REQUEST: 


“Robot, do need to take 
‘an umbrella today?" 


“Yes. iti ikely to rain, 
Take a coat, to0." 


TALKING WITH AI 


Chatbots, such those employed by virtual assistants (see p.116), are 
programs that can carry out conversations via text or text-to-speech. 
Natural language processing (NLP, see pp.112-113) helps chatbots 
mimic how humans talk during conversations. Based on classical Al, 
statistical Al, or a combination of the two (see pp.54-SS), chatbots can 
range in sophistication. Businesses often use basic chatbots to answer 
simple customer queries instead of providing immediate contact with 
a human employee. The most sophisticated chatbots, such as ELIZA 
in the 1960s, can give the impression of intelligence. 


CHATBOTS | 115 


Al HELPERS 


A virtual assistant is a software application or device that uses 
machine hearing (see p:109) and natural language processing 
(NLP, see pp.112-113) to perform tasks an command, such as searching 
tthe internet, playing music, or setting timers and alerts. Basic virtual 
assistants are essentially chatbots (see p.115), while more complex 
models can interact with other “smart” devices, via the Internet of 
Things (see p:104), to activate systems such as domestic lighting and 
heating. Many virtual assistants are cloud-based, and continually 
use voice data for training, which enables them to get better 
at predicting a user's needs and preferences, 


@ Machine hearing 
B converts audio 
totert 


Via the internet. 
oF Things, the 

sictant completes 
sek 


ne user makes. volce 
sarmand, suchas 
mon thelights 


116 | VIRTUAL ASSISTANTS. 


Thetest encoder within 
the al translates the 


5 


INSTRUCTIONS 


Tho Almatchosthe | 23 
Taturcters win | 3a 
Alto create, suitable images. | & 2 


Thelmage encoder 
os the composite 
image for the user. 


CREATING 
AN IMAGE 


Al ARTISTS 


Generative Al is the field dedicated to synthesizing new 
content, such as images, audio, text, or video, based on an 
Input in any of these formats. For instance, a generative Al 
model could be trained to produce an image of a cartoon 
giraffe when prompted with the text input “cartoon airaffe.” 
Al image-generation has existed since the 1960s, and can 
use a variety of classical and statistical techniques. Recently, 
however, generative adversarial networks (GANS, see p.87) 
have proved such effective “artists” that they have prompted 
debate about whether art can be considered uniquely human. 


GENERATIVE Al 


17 


INTELLIGENT ROBOTS 


A\s that are designed to interact physically with their environments 
are known as “embodied Als.” Such Alls, which include robots, mimic 
not only human cognitive intelligence, but human physical behavior as 
well. They do so with the help of sensors, motors, and other hardware, 
which enable them to perceive (see p.108), move in (see p.120), and 
affect (see p:121) their three-dimensional environments. Constructing 
such machines is an important step forward in Al, since much of what 
we consider to be intelligence in human beings involves our ability to 
interact with our surroundings. Embodied Als already include robotic 
vacuum cleaners and lawn mowers. 


EMBODIED Al 


Al COMPANIONS 


social robot is an embodied Al (see 
opposite) that is capable of interacting socially 
with humans, using speech, movement, facial 
expressions, and other humanlike behaviors. 
Social robots are limited as companions, since 
it is difficult to replicate many basic human 
abilities, such 2s manipulating objects (see 
pp.52-53) or understanding tone of voice. 
While largely treated as a novelty, they are 
nevertheless sometimes used in health and 
social care to alleviate loneliness, depression, 
and anxiety. Although they can come in 
any shape and size, most social robots 
are humanoids. 


SOCIAL ROBOTS | 119 


‘uoneunsopay —“worieuasepsy —«sndurAvesues — quauuosyue at juamiuoAua ain 
wpasserspalge ——ovtaed saqauy ayn MAM uoneunssp 49 ]apoweeui _anoge UReWLOSL| 
punove10qc1 | Sod YAMA apows SuIPUR YOReDo}UNO —YoReWiO}M\a1a dr Ad seAaLLED 
ainsieas iveu. ain BuLApms ogy SUSayaUapI Yau, —sezuebLOIyuy 5 Uans ‘iOsueS 
DNINNYTA —-NOILd3Dad Notsna NISNaS 


‘peaye Wed ayy ue|d 0} (Zz id ees) Hvar pue 

seiause> je213do se ypns ‘siosuas Aq pa29/[02 yep ss=201d UD 
4e4u3 fy ue Sey oqo. snowiouoyne Ayny y ‘uonUeN.EqUI LeWIAY 
NOURI ayeBIAeU Le> s194J0 B|IYM ‘sBuIag UeLUNY Aq AjaROWeL 
ajlo.qu0> ate aulos :Awouoyne yo saa.6ep Burksen aney SjOG0 
[IGOW Asal) ‘SpUEUILONIAUa slat aio\dxe pe punoue @AdU 
ued ‘51enou pauuewlun Se Yans ‘sos yng ‘seulj Uo2nposd 

tua se ypns ‘suonjsod Aseuni9e3s ul day aie sjoqo! Aue, 


ALIIGOW GNV LNSWSAOW 


120 | PHYSICAL INTERACTIONS | 


A 
i 


Bp 
| Atouchaystem 
| enables the rebot 
# topick the apple 


“The rbot navigates in 
deection ot te fut, 


MANUAL DEXTERITY 


One of the greatest challenges in robotics is building 
machines that can interact physically with their 
environments. To perform even the simplest human 
action, such as picking an apple, a robot must have an 
excellent sense of sight, as well as a sense of touch, 
which enables it to apply just the right amount of 
pressure to manipulate objects correctly. Many such 
robots are currently used in controlled settings, such as 
factories, but they may soon be sophisticated enough 
to help with domestic chores in people’s homes. 


PHYSICAL INTERACTIONS II | 121 


DRIVERLESS CARS 


Autonomous vehicles are examples of mobile robots (see p.120). They 
use systems incorporating sensors, Al, and actuators (see p.27) to assist 
or wholly replace the human operator of a vehicle, whether on land or 
sea, or in the air. “Seif-driving” or “driverless” cars are a category of 
autonomous vehicle under development, and cars incorporating 
semi-autonomous technology are now available. Their arrival on 
roads is raising complex legal and ethical questions, such as who would 
be responsible for accidents caused by Al-controlled cars (see p.152). 


Rear detects other Laser scanning A comera reeds 
vehices andre produces 330 map sions and 
thee speee, distance, ofthe vehicle's identifies waffic 
and direction of travel Surroundings. toh colers. 


GPS receiver monitors 
theearsslocation and 
plot the best route 


Road sense 
‘A central computer analyzes data from 
multiple sensors, enabling the cat to 

“understand” the driving environment. 


122 | AUTONOMOUS VEHICLES. 


‘he fully autonomous 
‘drone patrols the sky ane 
Isableto act on what ks 
sansore detect 


Al AND 
WARFARE 


Military needs drive much Al innovation. This has 
Y +. oe led to the creation of sophisticated autonomous 
systems that can perform military tasks with little 

‘or no human intervention. Some, including 

reconnaissance drones, are nonlethal. Others, 
such as sentry guns, are deadly weapons in their 
‘own right, capable of Identifying, locking onto, 


rt and firing at targets. There is much debate over 
was 


whether to ban the deployment of lethal weapons 
that are fully autonomous—those that enable 
rapid response by removing the need for a human 
to give the final order to attack. 


AUTONOMOUS WEAPONS | 123 


PHILOSO 
ARTIFIC 


INTELLI 


Als are designed to mimic human behavior—to calculate 

the way we do things, or, in the case of androids, to interact 
with the environment with humanlike aglity. However, as Als 
become more and more sophisticated, the question arises as 
to Where we should draw the line between the human and the 


artificial. Or, to ask the question another way: at what point 
should we say that an Al is, in fact, a person—has all of the 
qualities that a human has, and so should be aranted rights? 
The philosophy of Al addresses this central question. It 
examines the concepts of free will and consciousness, and asks 
what the difference is between an intelligence that has evolved 
biologically and one that has been built by human beings, 


HUMANLIKE 
Al 


For many scientists, building an AGI (artificial general 
intelligence) is the ultimate goal of Al research— 
although it may never be achieved, An AGI would be 


as intelligent as a human being, and may even have 
‘other human faculties, such as emotions or even 
consciousness, Ancther name for AGI is "strong Al,” a 
name that contrasts it with “weak Al,” which refers to 
all other Als that are built to perform specific tasks. 
Unlike a weak Al, an AGI would have something li 
intuition—the ability to know that somethi 
true without resorting to co 


126 | ARTIFICIAL GENERAL INTELLIGENCE 


Inthe near futuro, 
anttioa imtetigence 
willeutstriphuman 
Inteligence 


E 
3 


‘CAPACITY TELLIGENCE 


Me 


THE POINT OF 
NO RETURN 


In cosmology, a “singularity” is a point in space at 

which the familiar laws of physics break down, creating a 
phenomenon known as a “black hole’. In Al, the singularity 
is the name given to the point in time at which a machine 
will become as smart as the people who built it, and 
therefore clever enough to improve itself. Such a machine 
would be able to operate at the lightning speeds of a 
supercomputer, and s0 would swiftly achieve incredible 
abilities—including the ability to design Als itself. The 
singularity could therefore transform the world in 

ways that we simply cannot predict. 


THE TECHNOLOGICAL SINGULARITY | 127 


WHERE IS CONSCIOUSNESS? 


For centuries, philosophers have debated the question of how the 
mind and the brain interact—or, more broadly, how such a thing as 
consciousness can even exist in a physical world. The debate intensified 
in the 17th century, when scientists proposed that the universe is like 
‘a machine—a clockwork mechanism whose workings are in principle 
predictable. However, German philosopher Gottfried Leibniz 
(1646-1716) argued that if the physical world is mechanical, then the 
human brain must be linked to the rest of the body by the biolagical 
equivalents of cogs and pulleys, But if that is the case, he argued, 
then there is no place in the brain for consciousness, which he 
believed cannot be explained mechanically. 


THE IMITATION E tecnrereen 


printed responses 


GAME eee 


Alan Turing (see pp.18-19) 
devised a test, now called 
the Turing test, that 
provides a means for 
judging whether or not a 
machine Is intelligent. The 
test is based on a Victorian 
parlor game, in which one 
person tries to figure out 
whether another person, 
who is hidden behind a 
screen, is male or female, 
judging by the answers 
they give to certain 
questions. In the Turing 
test, both a human and 

a computer are hidden 
behind a screen, and an 
examiner supplies them 
with mathematical 
problems to solve. If 

both sets of answers are 
correct, then the examiner 
cannot say which are the 
computer's and which are 
the human's. The computer 
has therefore passed the 
test, and can be said to. 

be intelligent. 


+ the same questions. 


130 | THE TURING TEST 


Pifamachine | 
is expected to 
be infallible, it 
cannot also be 
intelligent.” 


[ore rusing =| 


THE TURING TEST | 131 


MACHINES AND 
UNDERSTANDING 


American philosopher John Searle (1932-) rebutted 
the idea that machines can think by arguing that 

while machines follow rules, they are incapable of 
understanding them (see pp.130-131). In what he 
called the "Chinese Room” thought experiment, 

Searle imagined a person in a room receiving questions 
written in Chinese. If the person had the appropriate 
rule book, they would be able to reply to the questions 
in writing, without actually understanding either the 
questions or the answers. Searle argued that to say 
that a computer can think is similar to saying that 

the person in the example understands Chinese, 


CHINESE NON CHINESE CHINESE 
SPEAKER SPEAKER SPEAKER 


THE CHINESE ROOM EXPERIMENT | 133 


A NEW KIND 
OF PERSON 


Many scientists argue that, one day, Als will be so lifelike that they 
should be treated like human beings. They claim that since humans 
have rights on the basis that they have free will, Als that pass a 
“free will test” should therefore have the same protections under 
law. This means that, in the future, an Al could claim ownership 

of its intellectual property, and even be penalized for making 
mistakes. Legally, such an Al would no longer be a machine, 

but a person—effectively, a new kind of human being. 


REsp, 
On. 
Ste, 
7 


"% 


Personhood 
‘fan Als troated ike a person, 
it may be granted both rights 

and responsibilities. 


AI RIGHTS AND RESPONSIBILITIES | 135 


stored on 


J q 
a. o t ‘s) 
Tad Treras 
ie ise 
§ Sone proarebet 


REPLICATING 
THE MIND 


According to the principle of multiple realizability (see p.20), 

the same computer programs can be run, or “realized,” on 

different devices. Computationalists (see p-12) argue that 

human thought is computable, and so can be realized by a 

machine as well as 2 brain. if this is true, then it should be 

possible to write a program that replicates a human mind, 

which could then be copied and transferred like any other 
program. This means that a person’s mind could be 
uploaded to a remote server, downloaded to a robot, 

and even duplicated innumerable times. 


136 | MULTIPLE REALIZABILITY 


et he 
ree () 


& ae 


Human-human interaction 
‘An Indlvidual knows what they 
themselves mean by “beetle,” but 
they cannot be cure that it means 
the same thing to someone else. 


TRANSPARENT THINKING 


The philosopher Ludwig Wittgenstein (1889-1951) argued that a 
person’s thoughts were like objects in a closed box—a box into which 
only they could “see.” We can never know what another person is 
really thinking or exactly what things mean to them, since the 
“box” is closed to us, A machine intelligence, however, could be 
examined in ways a human mind cannot. If the machine said it was 
thinking of a beetle, its programming could be exposed—“opening 
‘the box"—to show precisely what it meant by “beetle.” Such 
developments might, in turn, shed light on the mechanisms 
‘of human consciousness and thinking. 


A as 
S oo 
Human-Al interaction 
Arete 


A's programming to see whether 
itis thinking of the same beetle. 


OPENING THE BOX | 137 


LIVING 
ARTIFIC 


INTELLI 


GARBAGE IN, GARBAGE OUT 


Machine-learning networks (see pp.58-59) are only as 
good as the data on which they are trained, The most 
‘common cause of inaccurate results from an Al system 
is poor-quality training data, which includes input data 
that is incomplete, poorly labeled, full of errors, or biased 


(see opposite). For exemple, predictive Al systems 
(6ee pp.70-71) trained on inconsistent and incorrect 


historical data will produce useless predictions. in the field 
‘of computer science, the idea that bad inputs produce 
bad outputs is informally summarized as “garbage in, 
garbage out,” or “GIGO.” 


142 | DATA QUALITY 


PREJUDICED Oy 
> 


Sample 

Samples that 

n't represent 
‘the data set. 


Prejudice 
‘Societal 
prejudice 
reflected in 
datasets, 


ve) 
2 
m 
vn 


Measurement 
Faulty 

measurements 

Inthe data, 


Algorithm 


Bias built into 
algorithms, 


Exclusion 
Important 
features 
‘excluded from 

a dataset. 


‘The term “Al bias" |s used to describe Al systems that produce unfair 
results for particular groups of people. Al biases often reflect 
prejudices in society about gender, ethnicity, culture, age, and 
many others. Bias usually stems from the programmers themselves, 
via their algorithms and their interpretations of results, and from 
the data sets used to train an Al (see opposite). To combat this, 
programmers test their models to ensure that societal bias is not 
reflected in their results and use data sets that are representative. 


HIDDEN BIAS | 143, 


SSS 
INTEREST THREAT! 
SEARCH: PROFILE: 


Power station 


Oil company 


Climate change 


HUMAN USER AVAGENT 


MAKING ASSUMPTIONS 


Using personal date to try to predict an individual's future desires, 

opinions, and activities is known as “profiling.” In Al, machine-learning 
tools can be trained on large data sets to become expert at predicting, 

for example, the kind of internet content a user might like to see 
based on their viewing history. However, profiling can be problematic, 
since it can lead to false and even damaing predictions due to biases 

built into data sets and algorithms (see p.143). In order to root out 

such biases, it is essential that Al decision-making processes 
be transparent (see opposite), 


144 | Al PROFILING 


TRANSPARENT PROCESSING 


Machine learning models process data and make predictions 
using highly complex artificial neural networks (ANNs, see 
p.76). The inner workings of these models are often said to be 
a “black box” because they are too complicated and abstract 
for humans to “observe.” This means that the results they 
produce cannot be properly understood and checked for 
errors or biases. An alternative approach, known as 
“interpretable machine learning,” or “white box Al,” shines a 
light into the black box. White box Als are designed to give 
not just the result, but a breakdown of the processes 
they followed to reach it. 


sinceytis 
open te human 
Inspection the 
| processes con 
| be checked and 
Improved 
BLACK Box 
Since the A 
procesies cannot 
bbe aceecte tle 
harder to question 
its enue 


RESULT 


WHITE BOX Al | 145 


AN Al 
WORKFORCE 


The replacement of human beings by machines in the 
workforce is known as “technological unemployment”. Up until 
now, this phenomenon has not led to mass unemployment, because 
machines greatly increase productivity, which in turn stimulates the 
economy and creates new job opportunities, However, if Als begin to 
pass the employment test (see p.132), and achieve the intelligence level 
of AGIs (See p.126), then, one day, there may be few jobs left for human 
beings to do. Under such circumstances, the challenge for 
governments will be how to support the masses of unemployed 
people, which may include providing universal income—a regular 
payment to each member of society. 


146 | TECHNOLOGICAL UNEMPLOYMENT 


Balance of power 
The democratic 
‘approach to Al aims 
‘to censure that the 
technology benefits 
‘everyone, rather than 
atich, powerful ete. 


THE Al BALANCE 


2 MIM 
— 


‘Al has the potential to increase productivity and generate 
Income and opportunity. Shared by all, these benefits could 
create a more equal world, but if concentrated in the hands 
of the wealthy and powerful, the gulf between rich and poor 

will widen. Bias in design, data, and how and where Als are used 
can exacerbate social divides, increase inequality, and lead to 
hazardous and discriminatory applications. Attempts to mitigate 
these risks include inclusive design and embedding Als with 
values, such as fairness and accountability. 


Al AND EQUALITY | 147 


5 | Different mages 
8 | sfopinion re 
excluded 


Al algorithms are increasingly 
used to curate the content people 
see online—for example, on social media. An 
unintended consequence of this has been the 
creation of “filter bubbles,” whereby people 
are shown only content that tallies with or 
amplifies their own opinions, while alternative 
views get filtered out. This occurs due to 
“recommendation algorithms® (see p.95) 
repeatedly showing users material similar 
to what they have viewed in the past, 
encouraging biased thinking. 


Compatible 
epinianc are 
the only ones 
that mabe 
through. 


ALLOWED IN 


148 | FILTER BUBBLES 


“a)qe3>Ipadun aq pjnom Joieuaq S31 ‘yUIeNsad pue aaueNYU 
Uuewiny puodag st ty ue 22u9 ‘eouapuadapul sy! UleyUIEW 0} sdaxs 

‘ank2e 883 Wena PUE S191/0.U09 531 Jo Suogan.ASUI ay 32!eRUOD 
Jo aJ0UB! 07 ajge aq YBILU I'v snoWOUONe Ajjny ly Jo.,U09 04. 2q 
j9PJEY a4 ‘S2WODaq |y Ue jnJJMmod pue sNoWwoUOINe a10W BUR 
‘sas@Mo}{ “BupjeW-uo)sidap uapuadapul jo ajqede> 
ue 0} snowouone aq 03 paau jjiM 3! ‘sssujnjasn 
Iy Ue J] Jo.UO> Jo Wajqoid au} ‘ans! snoLias e salj JOO! JlaLy} 32 394 
‘AseuiBews oie uoR>y aDua19s ueldosAp 40 sjy anBos yy 


TOYLNOD 4O SLIWIT AHL 


Al AUTONOMY | 149 


RIGHT VS. WRONG 


As Als become ever more intelligent, the question of how 
toensure that they behave ethically becomes increasingly 
important (see opposite). Machine-learning tools have 
neither agency nor values, and so cannot be relied upon to 
offer suggestions that are in the best interests of humanity, 
‘or do not favor one social group over another. The only way 
to ensure that Als think ethically is to program them with 
ethical principles, although then the question becomes: 
whose ethics? Ideally, an Al should have equal respect for all 
humans, and be able to detect and compensate for bias 


UNETHICAL ETHICAL 
DESIGN DESIGN 
Black box White box: 
Decision-making Decision-making 
{snot transparent. Is transparent, 
People cannot see How an Al makes 
wihy the Al has made its decisions can 
tthe decision it has. be geen and judged, 


eae eee 
Soe 7 Pe pe 


Algorithm bias Algorithm fairness 
Biasis designed into Biasis designed out of. 
the Al, and those the Al atevery stage, 
who control it have = | © feomdata caltectian 
‘the most power. ao ‘offinal application, 


150 | ETHICAL DESIGN 


BUILT-IN 
ETHICS 


‘One way of ensuring that Als behave ethically (see opposite) is to 
program them with specific ethical rules or laws—a process known 
as “terminal value loading.” The classic illustration of this can be 
found in the science fiction stories of Isaac Asimov (1920-92), who 
formulated what he called the “three laws of robotics” (see above). 
However, as his stories explore, terminal value loading is far from 
foolproof, since even the simplest laws can generate contradictions. 
For example, an Al may be instructed not to harm a human being, 
but doing so may be the only way of saving a person's life. 


ASIMOV'S THREE LAWS | 151 


Some researchers argue that, one day, Als will 
not only be as intelligent as human beings, they 
will also have humaniike personalities, and so 
should be granted human nghts (see p.135). 
If an Al is given such rights, lawmakers would 
have to decide where to draw the line between 
holding the Al or its makers responsible for its 
actions. if the Alis deemed culpable for breaking 
the law—in other words, that it acted on its 
‘own free will—then it would have to suffer the 
appropriate sanctions or punishments for i 
actions, Like a human being, it could also be 
required to make amends for what it has done, 
and be open to reforming its character. 


WHO IS TO BLAME? 


152 | AL AND LIABILITY 


nome 
* wemtnguane 


Highly regulated 
Asinvoved.% 
with safety, law, 
aslenan male 
shcaton 


4 

ponent terete aD 

Gahemc sca eee 
emotions, and % 
ease 


& 
S 
sg 
“ey Partially regulated 
G 
& 
x 
S 


Unregulated e 
eat An ea 
me Aenbidecrier a 


‘games and spam filters. 


Concerns about the dangers that Als may pose in the 
future have fueled calls for Al research to be regulated. 
However, many scientists argue that regulating research 

will stifle innovation, and give unregulated countries a 
dangerous advantage. A compromise, proposed by European 
regulators, is to scale regulation according to risk. Low-risk 
applications of Al should have little or no regulation; high- 
risk applications should be controlled: and the most 
risky applications should be forbidden, 


Al AND REGULATION | 153 


EXISTENTIAL RISKS 


One possible threat posed by Al is known as the “alignment 
problem,” whereby the goals and values of an Al do not align 
with those of humanity. Named after a scene in the Disney 
cartoon Fantasia, in which a sorcerer’s apprentice makes 
a broom multiply uncontrollably, “Sorcerer's Apprentice 
Syndrome” neatly illustrates the problem in the form of a 
thought experiment. An Alis given the task of optimizing 
the production of paperclips, but believes that its job is only 
done when it has converted the entire planet into paperclips. 
It does so because it does not realize that it must prioritize 
human life over paperclip production. 


[cD] 


154 | AN Al DYSTOPIA 


UNLIMITED REWARDS 


Many Al researchers believe that Al will usher in a golden age 

for humanity—a time when machines will generate limitless 
abundance and prosperity. They argue that, with more powerful 
Als doing all the work that humans used to do, people will finally 
be free to devote their time to leisure activities and to pursuing 
their personal dreams. At such a time, they claim, there will be no 
scarcity of resources, and so no crime, war, or injustice—and Als 
will be able to help us to solve the world’s remaining problems, 
from disease to climate change. 


AN AL UTOPIA | 155 


INDEX 


Page numbers in bold refer tomain entries 


A {aril incligence { bottom-up approach 30 
| definition 7, $8 | brain 
dcountabity 47 hstorvor 79 asacomputer 2 
Seton planning 44-5 fvng with 139 Connections model 29 
beta tunction 1 priocophy of 9 15 and conscousneet 128 
‘actuators 27.122 uses of 91, 92-3 | mimicking 29.57 
agency 140 | aruficial neural networts and mind 128, 129 
ayers nttigent 27 «ON an 728-57 58,5968, | neurame 20.21 
agriculture, prociion 92.02, | 69, 76,80,8.82,84,85,84, | procecting cound 108 
107.139 7 101 10,185 | inconsciowsartont 52 
ibis 19,184, 15,187,148 loys 77 
"0 arts lnaligence 1 c 
Aproting 146 Asm, aac 151 
Alwtopaptopie 154-8 | Animovathre loa 131 colcdaion, component of 15 
Slgorthne 1214.76, 19.23, | avtonamcal esearch 92,401 | cleats 1728, 
28. 76. 84.94.95 | automate 9.10 | cameras 27. 104, 107 108, 120, 
Abies 139,18, 148, 46,150 | automated mentoring 92,108. a2 
Sigor thc wacing 59 Sutonomous bets 120, coal eatioships 67 
classification 66 | autonomous vehicles 7491, central processing unit (CPU) 
ompurtion 1S ont ne 
Sriertdescent 61.92 | aetonornout weapons 92.123, | Chaperone Devid 23 
‘Tachine learning $4,58,50, | autonomy al 18 chance vents 49 
75.8 chatbots 92.115. 116 
pathining 42 B chess 22 26 
pnacy 150 Caines Bom exparimert 22 
‘earesion 67 atbage Caves 16.47 chips computer 31 
fegue 140 backpropagation 82,84 Classe at 30 95.52 5357, 
Sneteenonkne centent 148 | bonkng 98 109.15 
weighing 8 | aes Thomas 46 | easton 64, 66.72.73 
atgnmentpronlem 15 | Byer theorem #6 | carsncation yrs 9 
Skeratve ios 140 | beherer | Cratcation tees 6 
Anata Engi 1817 srimicing human 115,118 clacot universe 128 
ancries 10,125 9 13 clove 105,107.16 
timated toys 1 editing 144 Cliteing 6064.60 
Sremaly detection 69,69,08, bas 6,76.34,139, 144,45, | coffntest 132 
98.106 107 148.150 | carumonsense 55 
animmevcans logic ut (ALy) |e 483 computation 121,19 
Baas bigs 33 eure and 20 
anon | bnycode 13.16.18.19.32 | computational 12.31 136 
aruda generalitetigence | blakbon AI 45,150 | computer ion 8-1,91, 91, 
(oc 6.146, 185 body and mind 120 pp tea 10,191 31 


156 | INDEX 


‘computers 
historyot 9 
hunen 17 
cconcisions 37,39,49 
ccannacted devices 104 
‘connactionism 29 
consciousness 12 26,125, 126 
128, 124,197 
contol mits of 149 
‘conversation 35, 59, 85,115, 
‘convolutional neural networks 
(CNN) a8. 
cost function 80, 81,82 
costs 42 
‘rons 107 
‘customer proties 70-1 
ceyberattacks 33,69 
cybersecurity 92, 6 
cyber warfare 93,97, 139 


‘data 24,25 
big data 23 
messy 52-3 
patternsin 30 
types of 32 
see.ito training data 
dataeschange 104, 105, 
data hierarchies 94 
data mining 60, 65,72, 
data processing 25,86 
dataqualty 142 
datasets 33,57, 60 65, 48-9, 
72.78, 184 
‘decision boundaries 66 
decnion toes 65. 
‘decsion-mating 
autoncerous 149 
transparency i 144,145, 150, 
decorative inowledge 28 
‘deep earring $8, $9, 86, 102, 
105, 109, 12 
deep noural networks (DMN) 
77.86.88 
deka rule 82, 83 
democratic Al 147 


denis. of carves (026) attacks 


” 
‘deterministic mocels 49 
Diflerence Engine 17 
dljtal acistants 103 

| Dystra, Edsger 129 

| scours integration 113 

| decriination 147 
discriminators 87 
sisease, identifying 102 


<einformation campsigne 97 


dstrbuted representations 
29 

domestic labor 121 

diverge care 122 

‘tones 107,120,123, 

dystopia, Al 154 


iE 
| fective methods 14 


ELSEoption 40,41 
embedded A 92, 103 


lembedied AI 92,108, 18,119 


emotional intelligence 11 
emotions 126,134 
employment tests 102,146 
encoded dita 12 


2.24 
ensemble learning 75 
environment 
| interacting with 13,121, 
us 
moving in 120 
reacting to 27 
equality Al and 147 
etnies 
‘Asimov three laws 151 
| ethics Al derign 150° 
‘questions of 122 123, 
18 
cevelutionary computing 28 
| oustentatsk 154 
| exopinets 92,101 
| expert ystems 35, 59-1 


ENIAC (Bectronic Numerical 
Integretor and Corrpater) 


F 


fecal recognition 59,93, 10, 
um 

factories 121 

feimness 147,150 

| fe data 
farming 107 
features 62-3, #5, 68 

| feedback, postive and negative 

m 
feedtoware neural networks 
Coors 

Sierbubblos 148 

| mance 92, 93, 989,139, 

| Matpeck test 132 

| frame representation 29 
frau, detecting/oreventing 93 
0 

fre wil 125,125, 152 

frequency 46 

funciona 129 

functions 72 


G 


| games 35 

| garbagein. garbage cut 142 

| generative adversarial networks 

| (ans) a7.017 

| generative Al 92,93, 117 
generators 87, 

| GS technology 91 
radiant descent 81,82 
sravcational waves. 101 


hacking 97 
| healtheare 92,93, 102-3, 


19 
heuristics 38,43 

hidden bias 142 

| hidden layers 77.86 

| high-frequency raging (HFT) 
93.98 


INDEX | 157 


humanbohavior 68, 115,118, 
119.135, 
hunenexperts $0 
hhumaninteligence 7, 11,26, 
54,55,112 
abinty oral tooutstrp 127 
‘mimicking 115, 118 | 
perceptionand center 108 
ve machine 134 i 
huren mine 
‘Als model for 137 
replicating 138 
humanrights 135, 152 
hhumanspeech 109,112,195 
humanoid rebate 19, 122 
hhuman-ALinteraction 137 
hhumen-human interaction 137 


WETHEN statements 40-7 
Images 16 86-9. 110. 1,117, 
Inequality 147 
Inference engine 50,51 
infrastructure monitoring 106 
Input 12 14, 21,72, 73 
Input ayers 77 
Inkeactual property 125, 
intetigence 
shuren ys, machine 125, 194 
muttile 11 
Turing test 130-1 
Ineetigence vests 432 
Ineetigont agents 27 
Internet of Things (GT) 91.92 
104, 105,106, 16 
interpretablemochinelearing 145 
Intuition 125 
vestments 99 


K ‘ 


keynods 94,95 

knowledge 
avalabie 43 
kendsof 38 : 
tailoring 39 


158 | INDEX 


knowledge bate 50, 51 
knowledge engineers 50 


L 


lapelec catasets 66,72 


labels 36,82, 62-3, 66,89 
LADAR 120, 122, 
language 


machine translation 114 

natural anguage processing 
2413 

law. breaking the 152 

layers 77, 88-9 

leaning 
deep 86 
ensemble 75 
mechanical 28 
reinforcement 74 
supervised 72 
unsupervised 73 


| legal questions 122 


Leibrig Gottfried 128 
Leibre’s question 128 
leisire time 155 
recters 36 


| lexical analysis 13 


lability, Aland 152 
ety outcomes 70-1 
lingusticinteligence 11,22 
loge 

computer 37,46 

human 30 35 
loge gatos 21 
logic representation 39 
Fogjeat A see casscat 


MeCullch, Walter 20,21 
machine hearing 93, 109, 19, 
16 


‘machine leaining 27,30, 33,57, 


50-9, 64, 65,67, 105, 
Al-driven malware 97 


‘aims about threas/benehts 


Wo 


cost function 80 
tnsemblearing 75 
Gans 87 
sprog oorbage out 42 
Sradint doen 1 
wt abled dita 72 
nd ratural language 
rocesing 12 
wth ra data 73 
| enforcement leasing 74 
| tapeniedlering 72 
trang dts 64 
super ie ating 73 
wie bos At 45 
rain prcoptan 92,108 
| machine meling 108 
| ractre teste 108 
| mache touch 108.121 
traci ansation OAT) #2 
vane 
| mina 86,97 
Manhattan Distance 42 
rranpulaton 93,221 
tranval erty 121 
Markov caine 48 
rrahematalrles $4 
tree! disgnos 92,102 
| fda voverch 02,100 
| reemory unis 2425 
 Ineniness 323 
| peta lnevleage 38 
| meropecesers 31 
rier appeations 23 
| rind 
‘Ae odor hartan 137 
anctody 129 
andra 120,129 
sinaturaaton 31 
Pasty Marnn 35 
trai phores 33,93, 105 
| rodelng 30 
| Moore Garcon 31 
Poores tow 31 
Morac Hane 52 
| motere 118 
| moversenymotity 9,120 
| rutile itligences 14 


‘multiple catzablity 20,126 | physialintaligence 11 


rrusie 109 | physi iterations 120-1 
myers. A 40-1 | Pret, water 20,21 

| pints 29,110 
N | plorieg 44-8, 120 

| poching layers 9 
natural lenguege 39 | pragmaticonass 113 
taturallanguage procetcing | predictions 22, 48,54, 55,58, 


(NLP) 35,88, 86, 92.98 72,85, Bb, 87.95, 142 


mag 115,16 Alproting 144 
nests 54 sncomble ening 75 
neural MT 114 ite outcomes. 70-1 
Ieuralnetworksseeartifeal stochastic models 49 
neural networes prejudices, societal 143 
Convolutional rwursl | pramvcae 37 
network deep neural | pesentingknowieage 39 
networks feedforward | prevousstate 48 
neural networks recurent privacy wosatione 150, 
neural networks | propabity 46-7 48,49 
neurons 20,7,03 | problem solving Aton 155 
artificial 21 see abe nodes | procedural knowledge 38 
odes 76,702 ako neurons | production ris 39 
urea intligence 11 roductinty 146,147 
prohing, Al 144 
° ropraming. quality of 180 
a, 18 
objects, dentiying 08-9 programs 16 
pening thebow 127 | prosperity 155 
opinions. compatible 148 proteln mapping 92,100 
output 72, 14,21,70,72,79,60 | punchedcards 16, 17,22 
output ovine 28 | purishments 74 


output ayers 77 | qualtativedata 32 


Quenitarivedata 32 
PQ i 


R 
Pzombes(phosopricl 


zombies) 124 rodoe 99, 122 
paral cetrbited processing ranking 92.94 

(Por) 29 ransomwate 97 
pathfndng 42 real-time mentoring devices 
patient monitoring 102 ws 
pattern recognition 64 | reasoning tasks 52 


perceptlon-based tasks 52, 
108, 120 


recommendations 92, 95 
recurrent neural natworke 


personaldata 144.150 | (RNNS) 77.83.85 
personality 152 | enocuve inetigence 11 
persenhood 135 regression 64, 67,72 


reguloticn, Aland 153 
reinforcement learning 74 
reprocessing 84 
research 92 
‘etronomical 101 
‘medical 100 
regulation of 153 
two elds of AI 54-5 
£ esoonsiblity 122,135,152 
restraint, human 149 
rights 125, 135 
rick 7.139, 140-1, 146 
control problem 149 
existential isk 7,154 
anc labilty 182 
Aancreguiation 153, 
robots 27,35, 44,92, 107, 115, 
Ene. 192, 136,140 
Asimov tree laws 151 
rranipulation 121 
‘mobile 120,122 
social 119 
rule-bases NT 114 
rules 40-4 


s 


Schank, Roger $4.55 
| sent research 100 
scruffes 54,55 

search terme 94,95 

Searle, Jobe 133 

semanticanalyss 112 

emanticropeesantstion 39 

sensormoter-nacectatks $2 

sensors 27, 103,105,106, 118, 
120,124,122, 123 

sensory inteligence 11, 108 

sequential data B5 

signatures, malware 96 

singularities 127 

smart technology 104-5, 116 

socal cvides 147 

oe! media 743 

socal robets 93,119 

sooetal bes 143 

| Socrates 11 


INDEX | 159 


oftware 27 eet date 63,02 


Solan approsrate | teughe Vv 
sora 71 computes 129 
Sorcerer’ AppesiceSyacame | humen 916,37 ‘eid gone 37 
154 threat-detection software | validation data 61 
spat ceigerce 11 ne ares 67.68 
Speediecognivon 107 | threldoicunts mu) tandom 49, 
pons Fe videor 10,111 
Teoe 37.45 top-down perch 30 ia astarts 85.92.93, 
toes 40-186 trong date 38.84, 63.06.72, nets Me 
stotstesA30,35.92,10, | then. O vies 7 
Ns aly 2-143 von 
srasteal 114 wrarsacton montoiag 98 cempute 849,919,102, 
Becher modes 49 centr 3 toe, 012 
pec ware 99 traapuancy Inde ture 10 
strongal 28, 28 mating 144143150 | ualcortex 88 
Swecturelinoniedge 38 | Walandor “Tenlcumenn John 24 
subsets 65 | true/false statements 40,46 | von Neumann architecture: 
supervsed erring 62.66.67, | Torna Alan T.98-19.20,23, | 2s 
” NR ie. | 
silos 37 uengiet nea.ne, We 
Sime A ee lsc A 
Syne Twchanp 23 wre 93 3 
Taal 36 vee ee 
Smeaton 1215.9 weapons tat 
syeracic moras 13 ons 
crear 4 ‘oning 94 
T veering 26 recomendations 5 
trachrer ae 13 wenn 21827278. 94 
Caos conten 95 ureepoyment 13% M46 whtebo al 3,150 
talking, chatbots 115 | Universal Machine, Turing’s | Wittgenstein, Ludwig 137 
tasks planing 44-5 11 wortonce and 186 
tecincogessingulles 127 | unlabeled datasets 73.07 | 
teetncopesluremplonment | unupersed amin 68.73 |Z, 
we wher arte £0 Me 
teenth 103 pa a 185 ioe ue 


terminal valueloading 151 


ACKNOWLEDGMENTS 


OK Wau Ike to thank the following for theirhelp Jackets Coordinator Priyanta Sharma 
with this bocke Vanessa Hamilton, Mark Lloyd, and 

ee Rehes for iustrations Alecndra Seedon for Al mages © Dorling Kincersey 
proofranding: Helen Petes forthe indy: For further information sae: 

{eet Hane for Amercangaton; rmeraiamages.com 


160 | INDEX 


